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TEACHER COMMENTS AND STUDENT PERFORMANCE: A 
SEVENTY-FOUR CLASSROOM EXPERIMENT IN 
SCHOOL MOTIVATION? 

ELLIS BATTEN PAGE 
University of California, Los Angeles* 


Each year teachers spend millions of 
hours marking and writing comments upon 
papers being returned to students, ap- 
parently in the belief that their words 
will produce some result, in student per- 
formance, superior to that obtained with- 
out such words. Yet on this point solid 
experimental evidence, obtained under 
genuine classroom conditions, has been 
conspicuously absent. Consequently each 
teacher is free to do as he likes; one will 
comment copiously, another not at all. 
And each believes himself to be right. 

The present experiment investigated the 
questions: 1. Do teacher comments cause 
a significant improvement in student per- 
formance? 2. If comments have an effect, 
which comments have more than others, 
and what are the conditions, in students 


? Portions of this paper were read at the 
National Research Conference of the Ameri- 
can Educational Research Association at San 
Francisco, March 8, 1958. This research de- 
pended upon cooperation from many per- 
sons. Space limitations prevent the listing 
of their names. The writer is especially in- 
debted to the teachers who freely donated 
time and energy after having been randomly 
selected. Without their participation the 
study obviously would have been impos- 
sible. 

* Where this study was conceived as part 
of a doctoral dissertation. The study was 
conducted in the San Diego City and 
County Schools while the writer was with 
San Diego Junior College. He is presently 
Coordinator of Guidance, Eastern Michigan 
College. 


and class, conducive to such effect? The 
questions are obviously important for 
secondary education, educational psy- 
chology, learning theory, and the pressing 
concern of how a teacher can most ef- 
fectively spend his time. 


Previous Retatep Work 


Previous investigations of “praise” and 
“blame,” however fruitful for the general 
psychologist, have for the educator been 
encumbered by certain weaknesses: Treat- 
ments have been administered by persons 
who were extraneous to the normal class 
situation. Tests have been of a contrived 
nature in order to keep students (un- 
realistically) ignorant of the true com- 
parative quality of their work. Comments 
of praise or blame have been administered 
on a random basis, unlike the classroom 
where their administration is not at all 
random. Subjects have often lacked any 
independent measures of their perform- 
ance, unlike students in the classroom. 
Areas of training have often been those 
considered so fresh that the students would 
have little previous history of related suc- 
cess or failure, an assumption impossible 
to make in the classroom. There have 
furthermore been certain statistical errors: 
tests of significance have been conducted 
as if students were totally independeni of 
one another, when in truth they were 
interacting members of a small number 
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of groups with, very probably, some group 
effects upon the experimental outcome. 

For the educator such experimental 
deviations from ordinary classroom condi- 
tions have some grave implications, ex- 
plored elsewhere by the present writer (5). 
Where the conditions are highly con- 
trived, no matter how tight the controls, 
efforts to apply the findings to the or- 
dinary teacher-pupil relationship are at 
best rather tenuous. This study was there- 
fore intended to fill both a psychological 
and methodological lack by leaving the 
total classroom procedures exactly what 
they would have been without the experi- 
ment, except for the written comments 
themselves. 


METHOD 


Assigning the subjects. Seventy-four 
teachers, randomly selected from among 
the secondary teachers of three districts, 
followed detailed printed instructions in 
conducting the experiment. By random 


procedures each teacher chose one class to 


be subject from among his available 
classes.* As one might expect, these classes 
represented about equally all secondary 
grades from seventh through twelfth, and 
most of the secondary subject-matter 
fields. They contained 2,139 individual 
students. 

First the teacher administered whatever 
objective test would ordinarily come next 
in his course of study; it might be 
arithmetic, spelling, civics, or whatever. 
He collected and marked these tests in 
his usual way, so that each paper ex- 
hibited a numerical score and, on the 
basis of the score, the appropriate letter 
grade A, B, C, D, or F, each teacher 
following his usual policy of grade dis- 
tribution. Next, the teacher placed the 
papers in numerical rank order, with the 
best paper on top. He rolled a specially 

*Certain classes, like certain teachers, 


would be ineligible for a priori reasons: 
giving no objective tests, etc. 
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marked die to assign the top paper to the 
No Comment, Free Comment, or Specified 
Comment group. He rolled again, assign- 
ing the second-best paper to one of the 
two remaining groups. He automatically 
assigned the third-best paper to the one 
treatment group remaining. He then re- 
peated the process of rolling and assigning 
with the next three papers in the class, and 
so on until all students were assigned. 

Administering treatments. The teacher 
returned all test papers with the numeri- 
cal score and letter grade, as earned. No 
Comment students received nothing else. 
Free Comment students received, in addi- 
tion, whatever comment the teacher 
might feel it desirable to make. Teachers 
were instructed: “Write anything that oc- 
curs to you in the circumstances. There 
is not any ‘right’ or ‘wrong’ comment for 
this study. A comment is ‘right’ for the 
study if it conforms with your own feel- 
ings and practices.” Specified Comment 
students, regardless of teacher or student 
differences, all received comments desig- 
nated in advance for each letter grade, as 
follows: 

A: Excellent! Keep it up. 

B: Good work. Keep at it. 

C: Perhaps try to do still better? 

D: Let’s bring this up. 

F: Let’s raise this grade! 
Teachers were instructed to administer 
the comments “rapidly and automatically, 
trying not even to notice who the students 
are.” This instruction was to prevent any 
extra attention to the Specified Comment 
students, in class or out, which might con- 
found the experimental results. After the 
comments were written on each paper and 
recorded on the special sheet for the 
experimenter, the test papers were re- 
turned to the students in the teacher's 
customary way. 

It is interesting to note that the stu- 
dent subjects were totally naive. In other 
psychological experiments, while often not 
aware of precisely what is being tested, 
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subjects are almost always sure that 
something unusual is underway. In 69 of 
the present classes there was no discussion 
by teacher or student of the comments be- 
ing returned. In the remaining five the 
teachers gave ordinary brief instructions 
to “notice comments” and “profit by 
them,” or similar remarks. In none of the 
classes were students reported to seem 
aware or suspicious that they were ex- 
perimental subjects. 

Criterion. Comment effects were judged 
by the scores achieved on the very next 
objective test given in the class, regardless 
of the nature of that test. Since the 74 
testing instruments would naturally differ 
sharply from each other in subject matter, 
length, difficulty, and every other testing 
variable, they obviously presented some 
rather unusual problems. When the tests 
were regarded primarily as ranking in- 
struments, however, some of the diffi- 
culties disappeared. 

A class with 30 useful students, for ex- 
ample, formed just 10 levels on the basis 
of scores from the first test. Each level 
consisted of three students, with each 
student receiving a different treatment: 
No Comment, Free Comment, or Specified 
Comment. Students then achieved new 
scores on the second (criterion) test, as 
might be illustrated in Table 1, Part A. 
On the basis of such scores, they were as- 
signed rankings within levels, as illus- 
trated in Table 1, Part B. 

If the comments had no effects, the 
sums of ranks of Part B would not differ 
except by chance, and the two-way analy- 
sis of variance by ranks would be used to 
determine whether such differences ex- 
ceeded chance. Then the sums of ranks 


‘The present study employed a new 
formula, 
6z(0 — E)? 
=0 


x? 


which represents a simplification of Fried- 
man’s twenty-year-old notation (2). The 


TABLE 1 
ILLUSTRATION OF RANKED Data 


Part A Part B 


(Ranks-within-levels 
on second test) 


(Raw scores on 
second test) 


| F 





10 





Sum: | 20 





Note.—N is No Comment; F is Free Comment; 8S 
is Specified Comment. 


themselves could be ranked. (In Part B 
the rankings would be 1, 3, and 2 for 
Groups N, F, and 8; the highest score is 
ranked 3 throughout the study.) And a 
new test, of the same type, could be made 
of all such rankings from the 74 experi- 
mental classrooms. Such a test was for the 
present design the better alternative, since 
it allowed for the likelihood of “Type G 
errors” (3, pp. 9-10) in the experimental 
outcome. Still a third way remained to 
use these rankings. The summation of each 
column could be divided by the number 
of levels in the class, and the result was 





new form is the classic chi square, 
(0 — E)* 
E 


multiplied by 6/k where k is simply the 
number of ranks! This conversion was dis- 
covered in connection with the present study 
by a collaboration of the writer with Alan 
Waterman and David Wiley. Proof that it is 
identical with the earlier and more cumber- 
some variation, 


12 
Nk(k + 1) 


will be included 
article. 


x? 2(R,)* — 3N(k +1), 


in a future statistical 
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a mean rank within treatment within 
class. This score proved very useful, since 
it fulfilled certain requirements for para- 
metric data. 


REsvULTs 


Comment vs. no comment. The over-all 
significance of the comment effects, as 
measured by the analysis of variance by 
ranks, is indicated in Table 2. The first 
row shows results obtained when students 
were considered as matched independently 
from one common population. The second 
row shows results when treatment groups 
within classes were regarded as intact 
groups. In either case the conclusions were 
the same. The Specified Comment group, 
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which received automatic impersonal com- 
ments according to the letter grade re- 
ceived, achieved higher scores than the 
No Comment group. The Free Comment 
group, which received individualized com- 
ments from the teachers, achieved the 
highest scores of all. Not once in a hundred 
times would such differences have occurred 
by chance if scores were drawn from a 
common population. Therefore it may be 
held that the comments had a real and 
beneficial effect upon the students’ mas- 
tery of subject matter in the various ex- 
perimental classes. 

It was also possible, as indicated earlier, 
to use the mean ranks within treatments 


TABLE 2 
Tue FrrepMaNn Test or THE OveR-ALL TREATMENT EFrects 


Units considered N F 





10.9593 < .0l 
11.3310 < .0l 


1488 
170.0 


1363 
129.5 


Individual Subjects 
Class-group Subjects 





TABLE 3 
Parametric Data Basep Upon Mean Ranks WITHIN TREATMENTS WITHIN CLASSES 








Source N F S Total 





74 222 
148.59 444.00 
304.01 905.01 

2.008 2.000 
.276 
.032 


Number of Groups 74 74 


154.42 
327.50 
2.087 
. 265 
-031 


140.99 
273.50 
1.905 

. 259 
.030 


Sum of Mean Ranks 

Sum of Squares of Mean Ranks 
Mean of Mean Ranks 

8.D. of Mean Ranks 

S.E. of Mean Ranks 





TABLE 4 
ANALYsIS OF VARIANCE OF MAIN TREATMENT EFFECTS 
(Based on Mean Ranks) 








Sum of 
Squares df 
1.23 2 
0.00 73 
15.78 146 


Source 





-615 5.69 < .0l 
000 


108 


Between Treatments: N, F, 8 
Between Class-groups 
Interaction: T X Class 


Total 17.01 221 





Note.—Modeled after Lindquist (3), p. 157 et passim, except for unusual conditions noted. 
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within classes as parametric scores. The 
resulting distributions, being normally dis- 
tributed and fulfilling certain other as- 
sumptions underlying parametric tests, 
permitted other important comparisons 
to be made.* Table 3 shows the mean- 
ranks data necessary for such compari- 
sons. 

The various tests are summarized in 
Tables 4 and 5. The over-all F test in 
Table 5 duplicated, as one would expect, 
the result of the Friedman test, with dif- 
ferences between treatment groups still 
significant beyond the .01 level. Compari- 
sons between different pairs of treat- 
ments are shown in Table 5. All differ- 
ences were significant except that between 
Free Comment and Specified Comment. 
It was plain that comments, especially 
the individualized comments, had a 
marked effect upon student performance. 

Comments and schools. One might ques- 
tion whether comment effects would vary 
from school to school, and even whether 
the school might not be the more appro- 
priate unit of analysis. Since as it hap- 
pened the study had 12 junior or senior 
high schools which had three or more 
experimental classes, these schools were 
arranged in a treatments-by-replications 
design. Results of the analysis are shown 
in Table 6. Schools apparently had little 
measurable influence over treatment effect. 

Comments and school years. It was 
conceivable that students, with increasing 
age and grade-placement, might become 
increasingly independent of comments and 

*It may be noted that the analysis of 
variance based upon such mean ranks will 
require no calculation of sums of squares 
between levels or between classes. This is 
true because the mean for any class will 
be (k + 1)/2, or in the present study just 
2.00.... An alternative to such scores would 
be the conversion of all scores to 7 scores 
based upon each class-group’s distribution; 
but the mean ranks, while very slightly less 
sensitive, are much simpler to compute and 
therefore less subject to error. 
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TABLE 5 


DirFERENCES BETWEEN MEANS OF THE 
TREATMENT GROUPS 





Comparison Probability 


Differ- |S. ie | 
ence Di 


Between N | .182 
and F 
Between 


.052 |3.500 <.001 


.103 | .054 (1.907) <.05 


and 8 | 
| 056 1a <.10(n.s.) 


Between F 
and 8 
Note.—The ¢ tests presented are those for matched 

pairs, consisting of the paired mean ranks of the treat- 

ment groups within the different classes. Probabilities 
quoted assume that one-tailed tests were appropriate. 


079 | 


other personal attentions from their teach- 
ers. To test such a belief, 66 class-groups, 
drawn from the experimental classes, were 
stratified into six school years (Grades 
7-12) with 11 class-groups in each school 
year. Still using mean ranks as data, sum- 
mations of such scores were as shown in 
Table 7. Rather surprisingly, no uniform 
trend was apparent. When the data were 
tested for interaction of school year and 
comment effect (see Table 8), school year 
did not exhibit a significant influence upon 
comment effect. 

Though Table 8 represents a compre- 
hensive test of school-year effect, it was 
not supported by all available evidence. 
Certain other, more limited tests did 
show significant differences in school year, 
with possibly greater responsiveness in 
higher grades. The relevant data (6, chap. 
5) are too cumbersome for the present 
report, and must be interpreted with cau- 
tion. Apparently, however, comments do 
not lose effectiveness as students move 
through school. Rather they appear fairly 
important, especially when individualized, 
at all secondary levels. 

One must remember that, between the 
present class-groupings, there were many 
differences other than school year alone. 
Other teachers, other subject-matter fields, 
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TABLE 6 





Tue INFLUENCE OF THE SCHOOL 





Source 





Between Treatments: N, F, 5S 

Between Schools 

Between Classes Within Schools (pooled) 
Interaction: T X Schools 

Interaction: T X Cl. W. Sch. (pooled) 


Total 


bility 


6.890 








Note.—Modified for mean-rank data from Edwards (1, p. 295 et passim). 
* Absence of an important main treatment effect is probably caused by necessary restriction of sample for school 
year (N is 36, as compared with Total N of 74), and by some chance biasing. 


TABLE 7 


Sums or MEAN RANKS FOR DIFFERENT 
ScnHoo.t YEARS 








n 


School Year N F 





12 
11 
10 
9 . : 
8 ‘ , 
7 ; 22. 


22.00 
23.03 
22.60 
21.60 
22.40 
20.98 





Note.—Number of groups is 11 in each cell. 


other class conditions could conceivably 
have been correlated beyond chance with 
school year. Such correlations would in 
some cases, possibly, tend to modify the 
visible school-year influence, so that illu- 
sions would be created. However possible, 
such a caution, at present, appears rather 


empty. In absence of contradictory evi- 
dence, it would seem reasonable to ex- 
trapolate the importance of comment to 
other years outside the secondary range. 
One might predict that comments would 
appear equally important if tested under 
comparable conditions in the early college 
years. Such a suggestion, in view of the 
large lecture halls and detached pro- 
fessors of higher education, would appear 
one of the more striking experimental re- 
sults. 

Comments and letter grades. In a ques- 
tionnaire made out before the experiment, 
each teacher rated each student in his 
class with a number from 1 to 5, according 
to the student’s guessed responsiveness to 
comments made by that teacher. Top 
rating, for example, was paired with the 


TABLE 8 
Tue INFLUENCE oF ScHoo, YEAR Upon TREATMENT Errect 








Source 


Proba- 


Mean 
F bility 


Sum of 
¢ Square 


Squares 





Between Treatments: N, F, 8 
Between School Years 

Between Cl. Within Sch. Yr. (pooled) 
Interaction: T X School Year 
Interaction: T X Class (pooled) 


Total 


2 530 
5 000 
60 -000 
10 
120 


5.25 <.0l 


1.12 


(n.s.) 


197 





Note.—Modified for mean-rank data from Edwards (1, p. 295 et passim). 
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description: “Seems to respond quite un- 
usually well to suggestions or comments 
made by the teacher of this class. Is 
quite apt to be influenced by praise, cor- 
rection, etc.” Bottom rating, on the other 
hand, implied: “Seems rather negativistic 
about suggestions made by the teacher. 
May be inclined more than most students 
to do the opposite from what the teacher 
urges.” In daily practice, many teachers 
comment on some papers and not on 
others. Since teachers would presumably 
be more likely to comment on papers of 
those students they believed would re- 
spond positively, such ratings were an 
important experimental variable. 

Whether teachers were able to predict 
responsiveness is a complicated question, 
not to be reported here. It was thought, 
however, that teachers might tend to 
believe their able students, their high 
achievers, were also their responsive stu- 
dents. A contingency table was therefore 
made, testing the relationship between 
guessed responsiveness and lciter grade 
achieved on the first test. The results were 
as predicted. More “A” students were 
regarded as highly responsive to comments 
than were other letter grades; more “F” 
students were regarded as negativistic and 
unresponsive to comments than were 
other letter grades; and grades in between 
followed the same trend. The over-all C 
coefficient was .36, significant beyond the 
.001 level.* Plainly teachers believed that 
their better students were also their more 
responsive students. 

If teachers were correct in their belief, 
one would expect in the present experi- 
ment greater comment effect for the better 
students than for the poorer ones. In fact, 
one might not be surprised if, among the 
“F” students, the No Comment group 
were even superior to the two comment 
groups. 

*In a 5 X 5 table, a perfect correlation 


expressed as C would be only about 9 
(McNemar [4], p. 205). 
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TABLE 9 
or Mean Ranks FOR DIFFERENT 
LetTrer GRADES 


MEAN 


Letter Grade N 
1.93 
1.91 
1.90 
2.05 
1.57 





Note.—Each eligible class was assigned one mean rank 
for each cell of the table 


The various letter grades achieved mean 
scores as shown in Table 9, and the analy- 
sis of variance resulted as shown in Table 
10. There was considerable interaction be- 
tween letter grade and treatment effect, 
but it was caused almost entirely by the 
remarkable effect which comments ap- 
peared to have on the “F” students. None 
of the other differences, including the par- 
tial reversal of the “D” students, exceeded 
chance expectation. 

These data do not, however, represent 
the total sample previously used, since 
the analysis could use only those student 
levels in which all three students received 
the same letter grade on Test One.” There- 
fore many class-groups were not repre- 
sented at all in certain letter grades. For 
example, although over 10% of all letter 
grades were “F,” only 28 class-groups had 
even one level consisting entirely of “F” 
grades, and most of these classes had only 
one such level. Such circumstances might 
cause a somewhat unstable or biased esti- 
mate of effect. 

Within such limitations, the experiment 


* When levels consisted of both “A” and 
“B” students, for example, “A” students 
would tend to receive the higher scores on 
the second test, regardless of treatment; 
thus those Free Comment “A” students 
drawn from mixed levels would tend to 
appear (falsely) more responsive than the 
Free Comment “B” students drawn from 
mixed levels, etc. Therefore the total sample 
was considerably reduced for the letter- 
grade analysis. 
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TABLE 10 


Tue Retation Between Lerrer Grape AND TREATMENT Errect 











Between Treatments: N, 

Between Letter Grades 

Bet. Blocks Within L. Gr. (pooled) 
Interaction: T X Letter Grades 
Residual (error term) 


Total 


Sum of 
Squares d 


Mean a 
Square F Probability 


2 1.385 5.41 <.01 
4 0.000 
65 0.000 
8 .610 2.40 


s .05>p> .01 
32. 130.254 


40.64 209 





Note.—Modified for mean-rank data from Lindquist (3, p. 269). Because sampling was irregular (see text) all 
eligible classes were randomly assigned to 14 groupings. This was done arbitrarily to prevent vacant cells. 


provided strong evidence against the 
teacher-myth about responsiveness and 
letter grades. The experimental teachers 
appeared plainly mistaken in their faith 
that their “A” students respond relatively 
brightly, and their “F” students only 
sluggishly or negatively to whatever en- 
couragement they administer. 


SuMMARY 


Seventy-four randomly selected second- 
ary teachers, using 2,139 unknowing stu- 
dents in their daily classes, performed the 
following experiment: They administered 
to all students whatever objective test 
would occur in the usual course of in- 
struction. After scoring and grading the 
test papers in their customary way, and 
matching. the students by performance, 
they randomly assigned the papers to one 
of three treatment groups. The No Com- 
ment group received no marks beyond 
those for grading. The Free Comment 
group received whatever comments the 
teachers felt were appropriate for the 
particular students and tests concerned. 
The Specified Comment group received 
certain uniform comments designated be- 
forehand by the experimenter for all simi- 
lar letter grades, and thought to be gen- 
erally “encouraging.” Teachers returned 
tests to students without any unusual at- 
tention. Then teachers reported scores 
achieved on the next objective test given 


in the class, and these scores became the 
criterion of comment effect, with the fol- 
lowing results: 

1. Free Comment students achieved 
higher scores than Specified Comment stu- 
dents, and Specified Comments did better 
than No Comments. All differences were 
significant except that between Free Com- 
ments and Specified Comments. 

2. When samplings from 12 different 
schools were compared, no significant dif- 
ferences of comment effect appeared be- 
tween schools. 

3. When the class-groups from six dif- 
ferent school years (grades 7-12) were 
compared, no conclusive differences of 
comment effect appeared between the 
vears, but if anything senior high was 
more responsive than junior high. It would 
appear logical to generalize the experi- 
mental results, concerning the effective- 
ness of comment, at least to the early 
college years. 

4. Although teachers believed that their 
better students were also much more re- 
sponsive to teacher comments than their 
poorer students, there was no experimental 
support for this belief. 

When the average secondary teacher 
takes the time and trouble to write com- 
ments (believed to be “encouraging”) on 
student papers, these apparently have a 
measurable and potent effect upon stu- 
dent effort, or attention, or attitude, or 
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whatever it is which causes learning to 

improve, and this effect does not appear 

dependent on school building, school year, 

or student ability. Such a finding would 

seem very important for the. studies of 

classroom learning and teaching method. 
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COMPARISON OF ORGANISMIC AGE AND REGRESSION EQUATIONS 
IN PREDICTING ACHIEVEMENTS IN ELEMENTARY SCHOOL 


HERBERT J. KLAUSMEIER, ALAN BEEMAN, AND IRVIN J. LEHMANN 


University of Wisconsin 


Olson (7) and Millard (6) report that 
both the average of and the relationships 
among seven ages in months—height, 
weight, grip, dental, carpal, mental, and 
reading—are useful in appraising the 
child’s level of performance in other areas 
such as language and arithmetic. Klaus- 
meier (5), however, found no statistically 
significant differences in height, weight, 
grip, dentition, and carpal age between 
high- and low-achievers in arithmetic and 
language. With this finding the present 
study was undertaken to ascertain how 
useful the first seven measures were (a) 
when combined in a best-eombination re- 
gression equation and (b) when considered 
of equal weight as in Olson’s system of 
organismic age, for predicting arithmetic 
and language achievement 12 months after 
the original measures were secured. 


PROCEDURE 


The subjects of this investigation were 
21 boys and 24 girls who were third- 
graders in 1955-56 and fourth-graders in 
1956-57 and 29 boys and 24 girls who were 
fifth-graders in 1955-56 and sixth-graders 
in 1956-57. These children were enrolled 
in four regular classes of two large elemen- 
tary schools of Madison and were all the 
children enrolied in the classes at the times 
the two sets of measures were secured. 

The following measures were obtained: 
standing height, weight, strength of grip 
of the preferred hand, number of perma- 
nent teeth, bone development of the wrist 
and hand, mental age with the California 
Short-Form Test of Mental Maturity 
(S-Form), and achievement in reading, 
arithmetic, and language with the Cali- 
fornia Achievement Tests (Complete Bat- 
tery, Form AA, Primary, Elementary, and 


Intermediate). In all instances results of a 
single measure were used except for 
strength of grip and carpal age. In secur- 
ing strength of grip, a first measure was 
taken with the palm of the hand upward, 
grasping the dynamometer; the second 
with the palm downward; and the third 
with the palm upward. The average of the 
highest two of the three measures was used 
as strength of grip. This method was found 
to yield a higher test-retest correlation, 
0.92, than using the single highest score 
with the palm upward, 0.83. All X-rays 
were read independently by the same two 
resident radiologists and the average of 
the two readings in months was used as 
carpal age. 

The above measures were secured in 
October, 1955, and again in October, 1956, 
within the same week for each of the four 
classroom groups, and each measure at ap- 
proximately the same hour of the day. 
Over this twelve-month interval, the meas- 
ures were found to be relatively consistent 
as the correlation coefficients in Table 1 
show. The primary level of the mental 
maturity and the achievement tests was 
used in 1955, the elementary level in 1956. 
The elementary level of the achievement 
battery was used in the fifth grade, the 
intermediate levtél in the sixth. No change 
was made in level of the mental maturity 
test. 

Evidence of reliability and validity of 
the measures is now given since measure- 
ment is crucial in this study. In 1955, a 
random sample of 30 children was drawn 
from the total population of this study 
and remeasured within 24 hours after the 
first measuring of height, weight, and 
strength of grip. The test-retest correla- 
lations were .99 for height, .99 for weight, 


182 





ORGANISMIC AGE AND PREDICTION OF ACHIEVEMENT 


TABLE 1 


CoRRELATIONS BETWEEN MEASURES 
OBTAINED ONE YEAR APART, IN 
OcToBeR 1955 AND OCTOBER 
1956 


Fifth-Sixth 
Graders 


Third-Fourth 
Graders 


Boys | Girls | Boys 


Height (inches) .99 | .82 | .96 
Weight (pounds) 87 | .96| .97 | .f 
Grip (kilograms) ‘ .83 | .76 | .6% 
Permanent teeth 47 | .46 | .77 
Carpal Age (month) .85 | .73 | .83 
Mental Maturity 43 | .76 | 
Score | 
Reading Test Score | .55 | .85 | .86 | . 
Arithmetic Score | .75  .86 .76 
Language Test Score’ .87 | .66 | .62 


and .92 for grip. The two radiologists’ 
independent readings of all the X-rays 
showed a correlation of .95 for third- 
graders and .86 for fifth-graders. Checks 
of successive dental records by the re- 
searchers showed that the dentist had 
identified permanent teeth without error. 
Thus, reliability of the five physical meas- 
ures is considered high; and the test 
manuals report high reliability for the in- 
telligence and achievement measures used. 
In addition, the correlations on the 
achievement measures reported above over 
the 12-month period indicate quite high 
reliability. 
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Concurrent validity of certain measures 
was also determined with the third-grade 
children. For a random sample of 30, 
California M.A. and Stanford-Binet M.A., 
obtained within six weeks of each other, 
correlated .82. Scores from the California 
Achievement Test in Reading correlated 
.90 with the Gates Advanced Primary 
Reading Test. Also, each of the four 
teachers ranked their children from highest 
to lowest in reading, arithmetic, and lan- 
guage achievement. The resulting rank- 
order correlations between test scores and 
teacher ratings are: reading—.91, .91, .90 
and 88; arithmetic—.65, 82, .77 and .56; 
language—.69, 83, 85 and .77. In each 
set of four correlations, the first two are 
for third-grade and the last two for fifth- 
grade classes. Considering the difficulty of 
arranging a group of 30 children in rank 
order in each of the subject-matter areas, 
the researchers consider the correlations 
as indicating that the achievement tests 
measure the teachers’ objectives suffi- 
ciently well for the purpose of this study. 


FINpDINGs 


In Table 2 are presented the mean 
Pearson product-moment correlations 
among the measures as obtained in Oc- 
tober, 1955. The mean correlations for the 
four groups of boys and girls are presented 
in Table 2 rather than each correlation on 
which the regression equations are based in 


TABLE 2 
Mean Correcations AMonG Raw Scores 1n NINE MEASURES OF 





Boys anp Griris, Tarrp AND Firtu Grapes 





Height Grip 


Dental 


; | 
| Mental | Rdg. Arith. Lang 


| 








Height 
Weight 
Grip 
Dental 
Carpal 
Mental 
Reading 


jane 
| ~ oe 
| 








[0 | 42 | 05 
-02 ‘ —.01 , 
13 , .12 : 18 





Arithmetic 


—.01 


— .07 , 10 
\ .64 
77 
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TABLE 3 


CONTRIBUTIONS OF SEVEN MEASURES TO THE CORRECTED MuLtTIPLe Rs anp Beta 
WEIGHTS FOR THE REGRESSION EQUATION TO PREDICT ARITHMETIC ACHIEVEMENT 








Height | Weight 


Grip 


Dental | Carpal | Mental | Reading 
| | | 





3rd boys 
Beta weight 
R 


3rd girls 
Beta weight 
R 
5th boys 
Beta weight — .302 
R -748 (5) 
5th girls 
Beta weight 
R 











— .258 
-719(3) 


-210 159 
-890(3) 


| 336 


‘ -819 
-890(2) 


-830(1) 


. 534 
-712(1) 


419 
-734 (2) 


300 
-695 (2) 


-423 
-673(1) 


-474 
855 (1) 


-423 
-886(2) | 











-906 (4) 








Note.—Blanks signify that the measure did not contribute .01 to the corrected Multiple 2 and did not then enter 


the regression equation. 


order to present a more concise summary. 
For a correlation to be statistically sig- 
nificant from 0 at the .05 level (2), it 
must be between .367 and .404, depending 
upon the size of the N previously given. 
Table 2 shows that no mean correlation 
between the five physical measures and the 
three achievement measures is significant 
at the .05 level and dentition does not cor- 
relate significantly with any physical 
measure. Of the original 80 correlations 
between physical and achievement meas- 
ures, only two, weight and language—fifth 
grade, were significant at the .05 level. 
However the other two measures com- 
prising the basis of organismic age, mental 
and reading, correlate positively and sig- 
nificantly with arithmetic and language 
achievement. 

Multiple correlations were computed, 
using the original correlations by grade 
and sex, and regression equations were de- 
rived to predict language and arithmetic 
achievement 12 months later. The multiple 
R and regression equation were calculated 
by the Wherry-Doolittle Test Selection 
Method (3). Any of the seven measures 
contributing .01 or more to the multiple 
R, uncorrected for shrinkage, were in- 


cluded in the multiple regression equation, 
provided their inclusion did not actually 
lower the multiple R when corrected for 
shrinkage. The Beta weights in the multi- 
ple regression equations for predicting 
scores were secured with the IBM 650 
computer by a method of inverse correla- 
tion matrices. Table 3 shows the corrected 
multiple Rs obtained between the seven 
organismic measures and _ arithmetic 
achievement, the order in which the vari- 
ous measures went into the multiple cor- 
relations, and the Beta weights for each 
regression equation. Reading correlated 
higher than any other measure with arith- 
metic for the four groups and thus went 
first into the multiple R and the regression 
equation. Differences between groups in 
the order in which the seven measures went 
into the regression equations and differ- 
ences in Beta weights are not so impor- 
tant, however, as the finding that the best 
combination of all five physical measures 
increased the corrected multiple R by only 
060 for the third-grade boys, by .032 for 
third-grade girls, by .053 for fifth-grade 
boys, and by .020 for fifth-grade girls. 
Table 4 shows similarly that the physical 
measures increased the corrected multiple 
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TABLE 4 


CONTRIBUTIONS OF SEVEN MEASURES TO THE CoRRECTED MuttipLe Rs anp Beta 
WEIGHTS FOR THE REGRESSION EQuvaTION TO Prepict LANGUAGE ACHIEVEMENT 











| Height 


| Weight 


Grip Dental | Carpal 


Mental | Reading 





3rd boys | 
Beta weight 
R 


Beta weight 
R 
5th boys 
Beta weight | 
R 


3rd girls | 
| 


826 (4) 
5th girls 
Beta weight 


R 2 | 


i= | 
.762(2) | 


764 (3)) 


-854 (1) 


| .317 527 
| .767(4) 


F | 297 .360 
| As | .662(1) | 
| 


-746(1) 








-718(2) 


| 


664 
.729(1) 








Note.—Blanks signify that the meazure did not contribute .01 to the corrected Multiple ? and did not then enter 


the regression equation. 


R for language above that obtained with 
reading or a combination of reading and 
mental age by .00 for third-grade boys, 
042 for third-grade girls, .127 for fifth- 
grade boys, and .034 for fifth-grade girls. 

In an attempt to ascertain whether or- 
ganismic age is a better predictor of 
achievement in arithmetic and language 
than is the predicted score derived by 
means of regression equations, Pearson 
product-moment correlations were cal- 
culated between regression-equation pre- 
dicted scores and the raw scores in arith- 
metic and language obtained one year 
later, and also between organismie age in 
months and the scores obtained one year 
later. These results are presented in 
Table 5. 


Table 5 indicates that the correlations 
are higher between the 1956 obtained and 
1956 predicted scores derived from re- 
gression equations than between the 1956 
obtained scores and the 1955 organismic 
age. Four of the eight correlations be- 
tween organismic age and predicted 
achievement in arithmetic and language 
are significant at the .05 level; but all the 
correlations between regression-equation 
predicted scores and obtained scores are 
significant beyond the .01 level. It is antic- 
ipated, of course, that were the same re- 
gression equations applied to other sam- 
ples, the obtained correlations between 
predicted and actual scores would be 
lower. 

The present results are in accord with 


TABLE 5 
CORRELATIONS BETWEEN ORGANISMIC AGE PREDICTION, REGRESSION-EQUATION 


PREDICTION AND OBTAINED ARITHMETIC AND LANGUAGE ScoREs 








Arithmetic 
Score and 
nismic 


Age iction 


Group 


- — Language 
Score 
and Regression 
Prediction 


La ge Score 
and Organismic 


Age Prediction and Regression 


Prediction 





3rd boys .370 
3rd girls .089 
5th boys 349 
5th girls .586 


273 - 683 768 
491 731 665 
472 -629 .657 
581 743 -693 
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those of Gates (4) and Blommers, Knief 
and Stroud (1), who used designs different 
from the present study. 


SuMMARY 


This study was conducted to compare 
the efficiency of organismic age, the aver- 
age of seven ages, and regression equations, 
based on raw scores in the same seven 
measures, in predicting arithmetic and 
language achievements of third- and fifth- 
grade children 12 months after the origi- 
nal measures were secured. The regression- 
equation predictions correlated higher 
with actual achievements than did organ- 
ismic-age predictions. The five physical 
measures in organismic age contributed 
little to mental and reading scores in pre- 
dicting arithmetic and language scores. 


H. J. KLAUSMEIER, A. BEEMAN, AND I. J. LEHMANN 
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SEX DIFFERENCES IN THE RETENTION OF 
QUANTITATIVE INFORMATION 


ROBERT SOMMER?’ 
The Saskatchewan Hospital, Weyburn 


Sex differences in arithmetic reasoning 
and spatial relations have consistently 
been found. Men are superior to women 
on these tests, while women excell on tasks 
requiring verbal ability and memory (1, 
5). However, several authors (1, 5) state 
that the female superiority in recall does 
not hold if the material is more interesting 
to the males. This is a reasonable state- 
ment but there has been very little re- 
search designed to investigate the reasons 
for this. It is also interesting to note that 
been far more research re- 


there has 


lating to motivational factors in percep- 
tion than there has been research relating 
to motivational factors in memory. Re- 
search on this latter point should be of 
concern to educators and others who hope 
to teach subject matter to students who 


vary widely in their interest in the content 
of courses. 

This study had its origins in the ob- 
servation that during routine testing of 
hospital patients with the Wechsler- 
Bellevue Information Scale, men did con- 
sistently better than women on items in- 
volving estimations of size or distance. 
Even intelligent women usually would be 
unaware of the population of the United 
States or the distance from New York to 
Paris. There seemed an inability to retain 
this information. It should be stressed that 
this was not a matter of computation or of 
analytic reasoning, rather it was a ques- 
tion of the retention of information that 
they had been exposed to a number of 
times. 


* This study was started by the writer and 
John Hinkle, Prabha Khanna, and Walter 
McDonald, and carried to conclusion by 
the present writer. We would like to thank 
J. B. Ray, G. McMurray, and H. Cooper- 
stock for use of their classes for testing. 


This sex difference is not unknown to 
writers and educators. Weber (6) writes, 
“Mention mathematics to a women and she 
freezes into a condescending attitude of 
tolerance—she knows it exists, she uses 
it when she must, but it certainly has 
very little to do with her own delightfully 
imaginative and delicate world of inter- 
ests.” However our experience has been 
that this debility is more fundamental than 
simply a reflection of hostility to mathe- 
matical reasoning. Schilder (3) speaks of 
our remembering “only what we can and 
will use in the present situation.” If this 
is so, then investigation into the retention 
of information should lead us into some 
rather basic attitudes regarding what type 
of information women believe is useful. 

The purpose of the present studies is to 
determine whether these sex differences in 
recall of sizes and distances, observed 
clinically with hospital patients, would ap- 
pear in other samples. Also of interest is 
whether there will be sex differences in the 
immediate recall of new quantitative ma- 
terial. 


EXPERIMENT ONE 


Procedure 


Two populations of Ss were sampled. 
The first consisted of patients in a mental 
hospital whose psychological test records 
were available in the files. The second 
was composed of students in elementary 
psychology classes at a small Midwestern 
college. The patient sample included only 
those whose last initials began with B, N, 
P, and T, who had been tested with the 
Information Scale of the Weschler-Belle- 
vue, who were between 18 and 69 years 
old, and who had IQ’s of above 70. This 
provided 156 cases, 96 males and 60 fe- 
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TABL 


PERCENTAGE OF Correct RESPONSES TO QUANTITATIVE ITEMS 








Male Female 
Item patients patients 
N = % N = 60 


E 1 





Male Female 
ins students students 
N=61N=3 





Population (U.S.) 41 18 
Distance 36 10 
Pints ° ° 
Teaspoons . 
Population (college town) ° 


(.01)  —s-64 47 
(.001) 5A 59 
72 91 
13 70 
51 28 





* Item either not administered or scored for this group 
** All p values are based on chi-square tests. 


males. The students were all between 18 
and 28 years old and constituted a sample 
of 61 males and 34 females. 

The procedure for the patients was 
simply to tally and compare the number 
of correct “population of U.S.” and “dis- 
tance from New York to Paris” responses 
from males and females. (It can be 
noted that there was not a significant 
difference between the mean IQ of the 
males, 94.3, and that of the females, 95.6). 
The students were all tested in their class- 
rooms by an examiner who requested them 
to answer the following questions: 


1. How far is it from New York to 
aris? 

2. What is the population of the United 
States? 

3. How many pints are there in a quart? 


TABLE 2 


PERCENTAGE OF CorREcCT RESPONSES 
TO QUANTITATIVE ITEMS 








Male Female 
Item students students 
N=8 WN =65 





Population 85 55 
(Canada) 

Distance 25 
Pints 86 
Teaspoons 48 
Population 52 
(university 

town) 





4. How many teaspoons are there in a 
tablespoon? 

5. What is the population of (the town 
in which the college is located) ? 


Results 


The responses to the information items 
are presented in Table 1. It is evident that 
there are very large differences in regards 
to estimating the population of the US. 
(with males excelling) and in recalling the 
number of teaspoons in a tablespoon (with 
females excelling) while many of the other 
differences are moderately large. 


EXPERIMENT Two 


In a study of this sort where a classifica- 
tion of Ss by sex is used, it is hazardous 
to speak of a genuine sex difference until 
a number of different groups have been 
sampled. Although the preceding table 
compared both college students and pa- 
tients, it seemed in order to replicate the 
study with a fresh sample. 

This time a group of 154 Canadian uni- 
versity students was used. The procedure 
followed was the same as in the previous 
study except that. the questions were 
altered to suit the Canadian culture; that 
is, the Ss were asked the population of 
Canada, the population of their university 
town, etc. The results are presented in 
Table 2. It is clear that the sex differences 
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in the previous sample are supported by 
these results. The males do far better on 
the population and distance items, while 
the females excel in recalling the number 
of teaspoons in a tablespoon. There is no 
difference between the sexes in answering 
the pints in a quart item. 


EXPERIMENT THREE 


With the procedures used in Experi- 
ments One and Two, we were not able to 
control the exposure of our Ss to the in- 
formation that was requested. Hence it 
was thought desirable to present new 
material to a group of Ss and see if the 
males would surpass the females in re- 
membering quantitative material. 


Procedure 


Two brief paragraphs were constructed, 
each containing both quantitative and non- 
quantitative material. Care was taken to 
see that the quantitative material should 
be “new” to the Ss so that any differences 
could not be attributed to previous ex- 
posure. Hence the “facts” that were pre- 
sented were fabricated and for the most 
part incorrect. The paragraphs were as 
follows: 


The Swedish ship, the Queen Fredrika, 
delivered its cargo of 12,000 pounds of wheat 
to Bombay. This city of 1,500,000 in a 
country of 264 million people is one of the 
richest trading ports in the Far East. The 
Captain of the ship was Olaf Hansen. 
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Last week was the scene of a bloody 
revolution in Venezula. This country of 
116,000 square miles is one of the richest 
oil-producing centers in the world. More 
than 1,200,000 barrels are shipped every 
month. The other important exports are 
tin, bananas, and cocoa. 

The Ss in the present study were 49 
women and 27 men who were studying to 
be psychiatric nurses at a large mental 
hospital. All Ss had at least an 11th grade 
education. The Director of Nurses Train- 
ing reported that he did not feel that there 
was any difference between the men and 
the women in intelligence, except that the 
women “did better on the exams” than 
the men. However a control for IQ can 
be found in the number of items of non- 
quantitative material retained by the men 
and the women. 

The Ss were tested at their customary 
class sessions and were told that a para- 
graph would be read to them. When the 
examiner directed them to begin, they 
should write down all they remembered of 
it. The instruction to begin writing fol- 
lowed several seconds after the reading of 
each paragraph. 

In scoring the recall data, the para- 
graphs were divided into “sense units” 
(similar in form to the units of the 
Wechsler Memory Scale). For example, 
/the Swedish ship/ the Queen Fredrika/ 
delivered its cargo/ of 12,000 pounds/... 
ete., this yielded a total of 25 nonquanti- 


TABLE 3 
AVERAGE Numper or Items REMEMBERED (NURSES) 








Nonquantitative items 


Quantitative items 





Avg. No. 
Remem- 
bered 


SD 


Avg. No. ? 
Remem- SD (one- 
bered tail) 





13.44 


13.24 








190 


tative units and 5 quantitative units. A 
quantitative unit was scored as correct if 
the number was recalled correctly regard- 
less of whether the unit (pounds, bushels, 
etc.) was accurate. All information as to 
the sex of the respondents was removed 
and the scoring was done by the writer. 
Twenty of the protocols were also scored 
independently by another researcher. The 
coefficient of agreement between scorers 
was .93 for the nonquantitative scores and 
1.00 for the quantitative scores. 


Results 


The average number of quantitative 
items recalled by the men was 1.30 + .26 
while the average number recalled by the 
women was .79 + .12. This difference is 
significant by t test at beyond the .05 
level. On the nonquantitative items, no 
difference in recall was expected. As is 
shown in Table 3, this prediction was also 
confirmed. 

As the level of significance of the dif- 
ference in Table 3 is not high, it was de- 
cided to repeat the procedure using a 
fresh sample. The paragraphs were read 
to students in two elementary sociology 
classes at the University of Saskatchewan. 
The procedure was identical to that used 
with the nurses. The results are presented 
in Table 4 and show that the female stu- 
dents do slightly better than the males 
in recalling the nonquantitative informa- 
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tion, but the males do significantly better 
than the females in recalling the quanti- 
tative information. When the two samples 
are pooled, a chi-square test shows that 
the sex difference in recalling the quanti- 
tative items is significant beyond the .01 
level (x* = 5.86, p < .01). 

Discussion 

The results from Experiments One and 
Two confirm the prediction that female 
Ss would be poorer than male Ss on the 
two Wechsler Information items (popu- 
lation and distance) under consideration. 
It should be remembered that although 
these were designated as “quantitative 
items,” they did not involve computa- 
tion, judgement, or even analytic reason- 
ing. To answer a question dealing with 
the population of the US. is not ordi- 
narily a test in estimating size or number. 
No one has “seen” the population of the 
U.S. and few Ss will attempt to find the 
answer by dividing the world’s population 
by a set percentage. This item involves 
simply the retention of a word or num- 
ber that one has seen and heard many 
times. 

One can attempt to explain these re- 
sults on the basis of the greater familiarity 
of the male Ss with population and dis- 
tance judgments and the females with 
pints and teaspoons. Yet this does not 
provide the whole answer for it is appar- 


TABLE 4 
AVERAGE NuMBER oF Items REMEMBERED (STUDENTS) 








Nonquantitative items 


Quantitative items 








Avg. No. 
Remem- SD 
bered 


Avg. No. 
Remem- SD 
bered 


=? 
(one-tail) 





Males 
N = 36 15.06 
Females 


N = 74 15.50 


1.42 
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ent that both men and women have been 
exposed to all of these items a number of 
times, surely enough for learning. We are 
all familiar with people who know the 
number of feet in a mile and the formula 
E = mC’ but are unable to remember the 
number of pints in a quart. Such for- 
getting is highly selective as was shown in 
the Levine and Murphy experiment (2). 

Experiment Three shows this difference 
appears with new material and cannot 
be attributed solely to a difference in pre- 
vious contact with the information. This 
should have implications for the teaching 
of mathematics and other subjects to fe- 
male students. Apparently the poorer per- 
formance of female students on tests of 
mathematics is more fundamental than a 
distaste for computation or algebra. Fur- 
ther research is necessary to determine the 
extent of this debility. There are some clues 
in our research that this deficiency is not 
directly related to an antipathy to all num- 
bers. The females excelled in recalling the 
number of pints in a quart and teaspoons 
in a tablespoon. We also administered a 
brief test of digit span to the university 
students used in Experiment Three. Al- 
though the males had surpassed the females 
in recalling the quantitative information 
from the paragraphs, there was no dif- 
ference in the recall of six and seven digits. 
This result parallels the negligible sex 
differences in recall of digits mentioned by 
Terman (4). Perhaps this indicates that 
many women are unable to retain large 
numbers (thousands or millions). An anal- 
ysis was made of the type of errors in re- 
calling the numbers from the paragraphs. 
It was found that 36% of these errors were 
due to the incorrect placement of the sig- 
nificant figures. That is, the S wrote 
“1200” or “120,000” instead of “12,000”; 
or “16,000,” “1,600,000” or 1,016,00” in- 
stead of “116,000.” This type of error con- 
stituted 40% of the incorrect responses by 
females and 27% of the incorrect responses 
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by males. This difference is not significant 
and it should be realized that these are 
percentages of the incorrect responses 
given by all Ss. That is, it does not include 
Ss who did not write any figure or who 
wrote the correct figure. However it does 
show that there is need for research on 
the types of numbers that can be handled 
by men and women. If women are able to 
remember seven digits but can not re- 
member five or six digit numbers it is 
important to learn the psychological char- 
acteristics of numbers qua numbers, in- 
stead of numbers as unrelated series of 
digits. 


SuMMARY 


This study was undertaken to determine 
whether some differences between male 
and female patients seen in a hospital set- 
ting in the retension of quantitative in- 
formation would be found in further tests. 
Three groups of Ss were used: 156 hos- 
pital patients, 95 U.S. college students, and 
154 Canadian college students. They were 
given several Wechsler-Bellevue Informa- 
tion items (population of US., pints in a 
quart, distance from New York to Paris) 
and a few other items. The results dis- 
closed that the males did better on the 
population and distance items while the 
females performed better on the pints and 
teaspoons item. It was also shown that 
males were better able to retain new quan- 
titative information when tested for im- 
mediate recall. No sex differences were 
found in remembering nonquantitative 
material. A brief digit span test also dis- 
closed no sex differences. The implications 
of this for research into selective retention 
were discussed. 
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THE RURAL THIRD GRADE LEVEL’ 


HAROLD D. HOLLOWAY 
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The children’s form of the manifest 
anxiety scale (CMAS) developed by Cas- 
taneda, McCandless, and Palermo (1) 
offers a new and much needed group ad- 
ministered criterion of child behavior at 
the lower school-age levels, specifically for 
the fourth, fifth, and sixth grades. Criteria 
of these kinds are understandably more 
scarce at the first three school grades, a 
fact which prompted the present study. 

Test-retest reliability coefficients _re- 
ported by Castaneda, et al. (1)? ranged be- 
tween .70 and .94 for samples of fourth, 
fifth, and sixth graders on whom the CMAS 
was standardized. In a series of articles 
using the CMAS as a predictor, the same 
investigators reported significant relation- 
ships between anxiety levels and perform- 
ance on various learning tasks (2, 7), 
preponderantly negative relationships be- 
tween anxiety and popularity (6), and 
evidence to suggest that anxiety is mean- 
ingfully related to school achievement and 
intelligence for certain grade levels (5). 

The purpose of the present research was 
to repeat essentially the reliability and 


1 The study was sponsored jointly by the 
Agricultural Experiment Station and the 
College of Home Economics, Department of 
Child Development and Family Relation- 
ships of the University of Tennessee. 

The author wishes to express his grati- 
tude to the superintendent, principals and 
third grade teachers of Farragut and Blue 
Grass Elementary Schools of Knox County 
for their cooperation in making the study 
possible. The author is also indebted to the 
following persons for suggestions concerning 
the manuscript: Mary E. Keister, Boyd R. 
McCandless, Alfred Castaneda, Ruth High- 
berger, and William O. Jenkins. 

2 Not to be confused with Ref. (2)—here- 
after, Castaneda, et al. refers to (1). 


standardization study of Castaneda, et al. 
using a third grade rural level in contrast 
to their samples of fourth, fifth, and sixth 
graders enrolled in a city school system. 


MerTHop 
Subjecis 


A total of 121 children,’ 64 boys and 57 
girls, enrolled in four third grade class- 
rooms of two rural schools located in an 
East Tennessee county served as Ss. Three 
classrooms were in one school, and the 
fourth in another school. The schools, sep- 
arated by about five miles or less, served 
adjacent communities of less than 2500 
population. Parents of the Ss were from a 
generally low- to middle-socioeconomic 
stratum as judged by occupational data. 
The Ss were about equally distributed as to 
number and sex within classrooms. 


CMAS Description 


The CMAS consists of 53 items.‘ Forty- 
two items were designated by Castaneda, 
et al. as “anxiety” items and formed the A 
scale (abbreviation imposed by present 
author); 11 items were “... designed to 
provide an index of the subject’s tendency 
to falsify his responses to the anxiety items 
...” (1, p. 318) and were labelled by the 
test authors as the L scale. By definition, 
the higher the A scale score, the higher the 
anxiety; and the higher the L scale score, 
the greater the tendency to falsify re- 
sponses to the A scale. 


* One female S not included in analyses 
due to absence during second test adminis- 
tration. 

* The items and scoring procedures may 
be found in (1, pp. 318-319). 
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Procedure 


Two major steps were taken to maximize 
reading ease and comprehension: (a) In- 
structions were altered slightly so the 
teacher could give the items orally as the 
S followed on his own copy—in the Casta- 
neda, et al. study, each S read and marked 
the items by himself. (b) Items were triple- 
spaced and typewritten in capitals. The in- 
structions used in the present study are re- 
produced as follows: 


TO BOYS AND GIRLS 


Follow each question carefully as I read 
it aloud to you. When I finish reading each 
question to you, put a circle around the 
word YES if you think it is true about you. 
Put a circle around the word NO if you think 
it is not true about you. Now let us begin. 


The testing program was carried out dur- 
ing the second half of the 1956-57 school 
year. The retest interval was approximately 
one week—seven days for three groups and 
six days for the fourth. 


ReEsULTs AND Discussion 


Scores obtained on the four groups were 
combined to form a single sample for the 
analyses. Table 1 includes the respective 
means (Ms) and standard deviations (SDs) 
for the first and second A scales (A; and 
A,), and similarly, for the first and second 
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L seales (L, and L). Additionally, Table 1 
contains the test-retest coefficients of re- 
liability (Pearson r) for both scales. 

As the various results are described, they 
will be compared with the Castaneda, et al. 
study. An attempt has been made to re- 
strict comparisons mainly to trends, because 
the two studies concerned different grade 
levels, slightly different administrative in- 
structions, and necessarily different error 
term components in tests of significance. 
Also, Castaneda, et al. presented only in- 
itial test duta. 


A and L Scale Ms and SDs 


From Table 1 it is important to indicate 
that: (a) For both A and L, the Ms for 
girls (see Col. 3) were higher than the cor- 
responding Ms for boys (see Col. 1) on 
both initial and final tests; and (6) for each 
sex, the Ms of A» and Lz (see Rows 2 and 
6) were less than the corresponding Ms of 
A; and L, (see Rows 1 and 5). Analyses of 
variance (Sex X Test Order) for the A and 
L scores separately resulted in one signifi- 
cant (coefficient of risk, p = .01) main 
effect—the pooled M of lL, (4.70) was 
significantly higher than the pooled M of 
L» (4.02). Both interactions and other main 
effects were nonsignificant. 

Using Sex X Grade analyses of variance 
with A; and L, scores as separate criteria, 


TABLE 1 
A anv L Scate Test-Retest Means, SDs, anp RELIABILITIES (r) 


FOR POOLED AND 








Item 


SEPARATE SEXES 


Girls 











Ai 

A: 
Pooled—A,-A: 
Reliability 


17.95 8.75 
18.41 8.35 
r= §2 


2.39 

2.00 

2.22 
65 


li 4.34 

Lz 3.78 
Pooled—L,-L: 4.06 
Reliability r= 


4.09 2.17 


r= .70 





Note.—Ns: Boys = 64; Girls = 57. All rs significantly different from zero—coefficient of risk, p = .01. 





RELIABILITY OF CHILDREN’S ANXIETY SCALE 


Castaneda, et al. found that girls scored 
significantly higher than boys on both 
scales. Thus, the significant sex differences 
of the Castaneda, et al. study were con- 
firmed in direction by results of the present 
study. 

In terms of magnitude, the Castaneda, 
et al. Ms and SDs of A, , and the SDs of 
L; , were very close to the corresponding 
values of the present study. However, the 
Ms of L, of the present study were approxi- 
mately twice the size of those reported by 
Castaneda, et al., a finding that suggests an 
interesting hypothesis for further study. 

Recall from above that on the initial 
scales there were higher anxiety levels and 
more falsification than on the final scales 
(compare pooled Ms in right-hand section 
of Table 1). Clinically speaking, it would 
seem logical to expect such a result, since 
there was perhaps some anxiety associated 
with taking the test itself the first time 
which tended to dissipate during the week 
prior to taking the test a second time. The 
L scale may be viewed then as having 
“played the role” of a defense against 
anxiety, so that more falsification occurred 
in the initial test, when Ss were presumably 
more anxious, than during the second test 
when less anxiety about test-taking was 
operating. 


A and L Scale Frequency Distributions 


Frequency polygons, smoothed by the 
method of running averages (3, pp. 52-54), 
were plotted for each sex separately and 
for both sexes combined using the A; and 
L, scale data. In general, the resultant six 
curves were unimodal, approximately bell- 


shaped, and fairly symmetrical. Most 
curves tended to possess a very slight 
positive skew but generally less skew than 
curve data presented by Castaneda, et al. 
For the pooled A; sample of the present 
study: median = 18.44; twentieth per- 
centile (Px) = 12.80; and Pw = 26.20. For 
the pooled L, sample the same statistics 
were respectively: 4.71; 2.73; and 6.73. 
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Curves were not plotted for the A, and L, 
data, but inspection of the frequency dis- 
tributions revealed them to be very similar 
to the A; and L, data. Considering the 
frequency data in general, the findings of 
the present study and those of Castaneda, 
et al. were in strong agreement. 


A and L Scale Test-Retest Reliabilities 


The rs between A; and A» (ra,-a,) and 
between L, and Lz (r:,-1,), for pooled and 
separate sexes, are shown in Table 1. 
Three noteworthy features of the-reliability 
results were: (a) All rs differed significantly 
from zero, represented substantial to 
marked relationships, and were generally 
comparable in size (although slightly lower 
in the case of A) to those of Castaneda, et 
al. (b) Coefficient rs,-,, was higher for 
boys (.82) than for girls (.71), a trend con- 
sistent with the fourth grade sample of 
Castaneda, et al. but opposite to their fifth 
and sixth grade samples. These combined 
facts suggested the hypothesis that within 
the age ranges included by both studies, 
boys tend to become less consistent in their 
responses to the A scale than do girls, but 
at the same time, the responses of both 
sexes maintain a relatively high level of con- 
sistency. (c) Due to the lack of significant 
sex differences for both A and L in the 
analyses of variance, although retention of 
the hypothesis of no sex differences does 
not prove it, it seemed reasonable to regard 
the pooled sex rsas the best single reliability 
estimates, viz., ra,-a, = 83 and ri,-1, = 
.70. The corresponding single estimates re- 
ported by Castaneda, et al. were .90 and 
.70 respectively, thus the two studies agreed 
very closely in this respect. 


Correlation between the A and L Scales 


The assumptions are made that: (a) the 
L scale indicates the tendency for S to 
falsify answers to the A scale; and (6) an 
attempt to falsify could result in either a 
high or low A scale score. Ideally then, 
from the standpoint of measurement, rs be- 
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tween the two scales would be zero or 
thereabouts. Correlations were computed 
between A; and L, and between A, and L 
for each sex separately and for both sexes 
combined. Respectively, the ra,-1, and 
ray‘, coefficients for the pooled sample 
were .14 and .05; for boys, .15 and .09; and 
for girls, .11 and .01. None of the rs was 
significantly different from zero (coefficient 
of risk, p = .01). These rs were generally 
comparable to those of Castaneda, et al.— 
theirs were also nonsignificant ranging be- 
tween —.11 and .22. 


General Conclusion 


The evidence obtained strongly supported 
the findings of the test constructors, Casta- 
neda, et al., who standardized their items on 
fourth, fifth, and sixth grade children. The 
principal conclusion drawn from the present 
study was that the A and L scales can be 
reliably employed as criteria using third 
grade rural children taken from populations 
similar to the one included herein. Whether 
or not the items are related to other opera- 
tionally defined concepts (validity) is a 
matter for empirical determination. 


Summary 


The main purpose was to obtain test- 
retest coefficients of reliability of the 
Children’s Form of the Manifest Anxiety 
Scale (CMAS) on a sample of 121 third 
grade rural Ss. The CMAS consisted of 42 
anxiety items (A scale) and 11 falsification 
items (L scale). The scales were adminis- 
tered to four classrooms by the respective 
teachers. The principal results, which were 
compared to those found in the reliability 
study of Castaneda, McCandless, and Pal- 
ermo, are listed as follows: 


HAROLD D. HOLLOWAY 


1. Pooled estimates of the reliabilitites 
(r) of the A and L scales were .83 and .79 
respectively. Correlations between the A 
and L scales approached zero. 

2. Girls scored higher than boys on both 
scales but not significantly. 

3. The general findings gave substantial 
support to those of Castaneda, et al. The 
evidence indicated that the A and L scales 
were sufficiently reliable to be used as cri- 
terion measures for samples from popula- 
tions similar to those employed in the 
study. 
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Interest in the application of techniques 
of self-evaluation to education and train- 
ing is a relatively recent development. 
While there has been an increasing number 
of articles which indicate favorable ex- 
perience with self-evaluation, research evi- 
dence concerning its value is lacking. 
Russell (4), in a 1953 survey of research 
on self-evaluation reports a lack of scien- 
tific study of the values of self-evaluation. 
Symonds (5) also indicates there are few 
reports on research results. Rogers (3), 
however, reports favorable experiential 
results with self-evaluation as a mode of 
appraisal. Thus self-evaluation has some 
empirical support but experimental evi- 
dence of its value is meagre. 


aids, conditions were considered especially 
favorable for a controlled study. 

In conducting the study, self-evalua- 
tion instruments developed by the ex- 
perimenter in a previous study were used 
(2). In that study he concluded that in 
schools of this nature, students could re- 
liably and validly evaluate gain in skills 
and knowledges achieved in a technical 
course of instruction. 

The instrument (which for convenience 
was called the SET) was one which re- 
quired the student to make an estimate of 
the level of skill or knowledge he possessed 
upon entrance into the course as well as 
the skill or knowledge he attained upon 
completion of the course. A sample item 
from the form is shown below: 


How proficient are you in using a multimeter to measure 
output voltages and currents of a vacuum tube? 


l 2 3 4 


5 6 7 8 9 





Am familiar with 
the job but need 
considerably more 
training and prac- 
tice 


Not prepared 
to do the job 
without —thor- 
ough training 

ance 


The study reported here was undertaken 
to determine if self-evaluation is of value 
in improving student achievement. In 
other words, does self-evaluation give the 
student a basis for improved fun’tioning 
as a student? 

The study was conducted in two Air 
Foree Schools at Seott Air Force Base, 
Illinois. Ss were Air Force enlisted 
students in electronic communications 
courses. Due to similarity of the students’ 
background, age, living conditions, apti- 
tudes and to close control of curriculum, 
teaching methods, training materials and 


Can do the job 
with further OJT 
and close 
vision and assist- 


Can do the 
job with no 
supervision 


Can do the 

job if given 

adequate su- 
pervision 


super- 


The student responded to each item by 
marking the scale with a check mark 
(VY) to skill he 
thought he possessed at the beginning of 
the course. An X represented his estimated 
attainment at the end of the course. 

Using the SET as a device for self- 
evaluation, an experiment was organized 
in each of two schools. In School A, ap- 
proximately 100 cases were used as con- 
trol. The test group, also 100 cases, used 
the above-described device to evaluate 
themselves at the end of each “test 
point” of instruction. One to three weeks 


indicate the level of 
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elapsed between each test point. Thus 
each student in the test group evaluated 
his progress in the course every one, two 
or three weeks of the course. Each SET 
encompassed skills learned in the previous 
two or three weeks. 

School A consisted of 20 weeks of in- 
struction for six hours per day, five days 
a week. In this case the test was closely 
controlled by the experimenter so that all 
instructors administering the SET gave 
the same instructions and administered it 
in the same manner. 

In School B approximately 75 cases 
were entered as a test group and a like 
number was used as control. In this school, 
the SET was administered by school per- 
sonnel and was handled as a normal part 
of class activity. This was done for the 
purpose of determining the effect of using 
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self-evaluation in a “normal” class situa- 
tion and to determine if special conditions 
of the experiment may have had effect on 
achievement results. 

Student achievement was evaluated by 
regular course tests, used as a basis for 
determining grades. These criterion meas- 
ures were carefully constructed and vali- 
dated. Split-half reliabilities ranged from 
.74 to 90. The criterion measures had 
curriculum validity through construction, 
since “test blueprints” were used to de- 
rive items directly from curricula mate- 
rials. In addition each item was evaluated 
by at least three experts as to its rele- 
vance to the job for which the man was 
being trained. Other item data including 
discrimination and difficulty indexes were 
used in construction of the tests. 

Results from School A were obtained 


TABLE 1 
Test Score MEANS AND t Ratios ror Test-Contrrot Groups 1n ScHoout A 


(N = 75) 


Raw test score means by check points 
Check point no. 





31.1 


Test group mean 


Control group mean 29.1 31.0 17.7 
t 1.61 


* Significant at .02 level. 
** Significant at .01 level. 


2.64*° 3.26** 2.78°* 3.25°* 


4 5 6 


39.8 
16.2 37.5 21.0 
2.64* 


95 1.31 


TABLE 2 
Test Score MEANS AND ¢ Ratios ror Test-ContTrot Groups in Scuoor B 


(N = 33) 


Raw test score means by check points 


Check point no. 








a Se 2 





Test group mean 23.5 21.3 21.6 18.8 22.9 22.6 23.3 


21.8 21.0 20.2 17.4 22.6 17.9 18.9 
1.93 .28 1.43 1.88 .36 3.07* 3.45* 3.41* 2.7 


Control group mean 


* Signi‘icant at .01 level. 





SELF-EVALUATION AND ACHIEVEMENT 


from a total of 75 paired cases which re- 
mained from the original 100 paired cases. 
Others were lost due to elimination and 
other administrative problems. Students 
in test and control groups were paired by 
aptitude index based on a battery of tests 
given at induction centers. 

Table 1 shows results of all measures 
given to both groups in School A. 

All nine measures indicate a positive 
difference in favor of the test group. Five 
of the nine differences show a t which is 
significant at the one per cent or two per 
cent level. Thus evidence strongly favors 
the test group. If all measures are com- 
bined by means of the use of multiple 
critical ratio, as suggested by Chapin (1), 
an MCR of 6.40 is obtained indicating a 
highly significant difference in favor of the 
self-evaluating group. 

Results obtained from 33 matched pairs 
in School B are shown in Table 2. 

Results in School B where “normal” use 
of self-evaluation was attempted, also in- 
dicate a positive difference in favor of the 
test group on all measures. Measures num- 
ber eight and nine were not used because 
of improper administration. Four of the 
nine differences show a t which is signifi- 
cant at the one per cent level. An MCR 
of 6.17 also indicates highly significant 
difference in favor of the self-evaluating 
group. 


Summary and Conclusions 


This study was accomplished to deter- 
mine the effect of periodic self-evaluation 
on student achievement. Students in two 
military technical schools periodically 
evaluated their skill and knowledge during 
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a course of instruction, under careful con- 
trol of the experimenter in one school and 
as part of the “normal” class activities in 
the second. Achievement of test groups on 
regular school tests was compared with 
that of control groups which were 
matched by aptitude. The results favored 
the self-evaluation group, with multiple 
critical ratios being statistically significant 
in both schools. The results lead to the 
conclusion that in this particular situation 
students, given formal and periodic op- 
portunities to evaluate themselves, can 
achieve to a greater degree than students 
not having such opportunity. 

The study also raises several questions: 
Does a device such as the one used furnish 
additional motivation, sharpen percep- 
tions of the objectives to be achieved, or 
result in better organization of previous 
learning on which future learning is 
based? These and similar questions should 
furnish a basis for future study in the area 
of self-evaluation. 
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Part of the contemporary pedagogical 
trial-and-error efforts to improve tech- 
niques of classroom testing has focused 
upon the use of the open book examina- 
tion. In such an examination the student 
is allowed to make use of any materials at 
his disposal, including textbooks, lecture 
notes, and dictionaries, but does not ob- 
tain answers either directly or indirectly 
from other students. 

‘Tussing (2) summarizes the various ar- 
guments for using the open book test as: 


1. The test can be constructed and used 
in all the various forms that the traditional 
test can be used; 2. Much of the fear and 
emotional blocks encountered by the stu- 
dent is removed; 3. Emphasis is placed upon 
the practical problems and reasoning, and 
less emphasis is placed upon pure memory 
of facts and items; 4. Cheating with cribs 
and other devices is eliminated; 5. This ap- 
proach is more adaptable to evaluating 
student attitudes and posing the question of 
what action should be taken on social 


issues. 


Some of the arguments opposing the use 
of the open book should also be recog- 
nized; namely, (a) It is likely to reduce 
study by allowing some students to feel 
that the use of the book will enable them 
to “slide through” with a minimum of 
study; (b) There is some reason to believe 
that a certain amount of rote memory 
may bring about the overlearning so often 
ne@essary to a full understanding of a sub- 
ject; (c) Note-passing and looking at the 
test paper of a nearby student is made 
easier in the confusion of looking through 
papers and books; (d) A more superficial 
knowledge of the material is encouraged. 

The author wishes to express his thanks 


to W. E. Vinacke and John Digman for 
their help with the manuscript. 


PROBLEM 

The present study is an attempt to de- 
termine the equivalence of two approaches 
to the administration of examinations; 
namely, the conventional closed book ver- 
sus the open book. The general hypotheses 
are: (a) The open book examination will 
lead to fewer student errors; (6b) The 
open book examination will measure dif- 
ferent abilities than those assessed by the 
closed book tests; and (c) There is no 
correlation between student ratings of the 
help received from open book examina- 
tions and their test scores. 

The first hypothesis is based on the ap- 
parent truism that the opportunity to 
look up material at its source should pro- 
vide greater accuracy of response than 
depending upon memory. The null hy- 
pothesis may be stated as follows: An ex- 
perimental group, receiving an open book 
examination, will not differ significantly 
in terms of total errors from a control 
group which receives the same examina- 
tion under the traditional method. 

The second hypothesis is based on the 
assumption that certain individuals will 
do better work on a closed book test while 
others will do relatively better on an open 
book examination, the differences being 
functions of differential responses to the 
pressure of the examination situation, an 
altering of motivation in studying, the 
ability to make organized use of texts and 
notes, ete. Specifically, then, the null hy- 
pothesis will state: The correlation be- 
tween two closed book examinations will 
not differ significantly from the correla- 
tion between an open book and a closed 
book examination, assuming the sets of 
examinations and testing conditions are 
equivalent in every way. 
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OPEN BOOK EXAMINATIONS 


The third hypothesis was based pri- 
marily upon the “educated guess” of the 
investigator which, in turn, was based on 
casual observations of the grades of stu- 
dents taking open book tests. The null hy- 
pothesis states: There will be no difference 
in number of errors on open book exami- 
nations between groups of students who 
rate open book examinations as being 
helpful and those who rate them as being 
nonhelpful. In this case, it is predicted 
that the null hypothesis will be accepted. 


MeEtTHOD 


Subjects. The subjects were 158 stu- 
dents at the University of Hawaii, 85% 
women and 75% sophomores. Seventy- 
four of the students were enrolled in one 
section of child psychology, while the re- 
maining 84 were enrolled in another sec- 
tion of the same course with the same in- 
structor. 

Examinations. Two mid-term examina- 
tions, approximately six weeks apart, were 
given to both sections. Each examination 
consisted of 50 questions, all multiple- 
choice items with five alternative re- 
sponses, only one of which was acceptable 
as correct. About one half the items were 
based on the class lectures; the other half 
were drawn from the text. Included were 
items which were distinctly factual in 
nature, items which attempted to get at 
understanding of relationships, and items 
which measured an understanding of ter- 
minology. It is believed that the tests 
were fairly typical of college tests of the 
multiple-choice variety. 

The students were allowed 50 minutes 
for the examination, from the time the 
tests were completely distributed until 
the time they were collected. The score 
was the total number of errors, a high 
score thus indicating a low grade. 

Procedure. The two sections, meeting in 
the same room at successive hours, were 
given the same six-week examinations on 
the same day at successive hours. Both 
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sections had covered the same material 
during the class periods, had the same as- 
signments, and received the same exami- 
nations. Opportunity for communication 
between sections was eliminated by keep- 
ing the students from the first section 
in the classroom until the end of the ex- 
amination period, then releasing them by 
a back door while ushering the second 
class in through the front. There was 
little if any possibility for passing on 
questions and answers. 

The first section to meet (Class A) was 
given the usual closed book examination 
for both six-week examinations. The sec- 
ond section (Class B) was given the same 
examinations, but only the first examina- 
tion was in normal closed book form. At 
the first class meeting following the exami- 
nation it was announced for the first time 
that the next examination would be “open 
book.” Class B’s second examination was 
taken with the use of textbooks, notes, 
and dictionaries. Otherwise, testing con- 
ditions were exactly the same as for Class 
A. 

Most students in Class A had from 15 
to 20 minutes remaining after completing 
the second examination, so it was assumed 
that most students in Class B should have 
had approximately that much time to look 
for answers in the material available to 
them. 

At the close of the examination period, 
Class B was asked to indicate how much 
help the open book procedure provided 
by writing “None,” “Little,” “Some,” or 
“Much” on their answer sheet. 

Replication. A replication, comparable 
in every way except one, was conducted 
with 161 students, divided into Class A’ 
(N = 79) and Class B’ (N = 82). The 
one way in which the replication differed 
from the original study was that the two 
classes were not held in the same class- 
room and that communication between the 
two groups was not so well controlled. It 
is still highly unlikely that communication 
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occurred to the extent of influencing the 
study. 


REsvULtTs 


To test the first null hypothesis, it was 
first necessary to determine whether the 
two sections (Class A and Class B in the 
original experiment, Class A’ and Class B’ 
in the replication) were comparable in 
ability. This was accomplished by com- 
paring their scores on the first examina- 
tion, taken by both groups under the same 
conditions. Since the first and the second 
examinations were not necessarily of equal 
difficulty, the effects of the open book ex- 
amination had to be measured in terms of 
the differences between the sections on the 
two examinations. These data are con- 
tained in Table 1. 

As can be observed in Table 1, the 
scores were relatively the same for Class 
A and Class B (and for Class A’ and Class 
B’) on the first and second examinations. 
Although in each case the Experimental 
Group obtained approximately a one- 
half point relative increase under the ex- 
perimental conditions, the difference is far 
from statistically significant. 

Therefore, Null Hypothesis 1 must be 
accepted. It would appear from the results 
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that, under the given conditions, the op- 
portunity to use text and lecture mate- 
rials resulted in no difference in total 
errors. 

To test the second null hypothesis, 
Pearson product-moment correlations 
were computed for each class. The sig- 
nificance of the difference between cor- 
relations for Class A and Class B (and 
for Class A’ and Class B’) was then 
computed. The obtained correlations and 
their significance of difference levels are 
shown in Table 2. In both the original 
experiment and the replication, the cor- 
relation for the control condition was 
substantially higher than for the experi- 
mental condition (r’s 691 and .579 
as opposed to 495 and .460), which was 
as hypothesized. 

Although neither difference was sig- 
nificant with the two-tailed t test (t 
1.90 and 1.00), both were in the expected 
direction. The probabilities of the two 
experiments were combined according to 
the chi square method for independent 
samples suggested by Gordon, Loveland, 
and Cureton (1). The obtained chi 
square was 11.841, which is significant 
beyond the .02 level of confidence with 
four degrees of freedom. 


TABLE 1 
Mean NumsBer or Errors ON EXAMINATIONS 








Mean Number Mean Number 


Errors 
Experimental 
Condition 
(Exam II) 


Errors 
Both Closed 
Book 


(Exam I) 





Experiment 
Control (Class A) 


Experimental (Class B) 
Difference 
Replication 
Control (Class A’) 
Experimental (Class B’) 


Difference 


15.68 
13.10 


2.58 


13.41 
14.88 


1.47 








OPEN BOOK EXAMINATIONS 


TABLE 2 
CoRRELATIONS BETWEEN Scores ON First EXAMINATION AND SECOND 


EXAMINATION AND TESTS OF SIGNIFICANCE BETWEEN CORRELATIONS 


Experiment 
Control (Class A) 
Experimental (Class B) 


Replication 
Control (Class A’) 
Experimental (Class B’) 


Correlation 


t Between Level of 
groups confidence 


-691 


495 056 


579 
-460 











Combined X2 = 11.841 


P< @ 








TABLE 3 


CHANGE IN NuMBER oF Errors From First EXAMINATION TO SECOND 
EXAMINATION FOR THE EXPERIMENTAL GROUPS AS A FUNCTION 


or ArtirupEs ReGarDING OrpEN Book MeEtHops 





Extent of Help Received From Open Book Examination 








. |Much help 


Therefore, the second null hypothesis 
was rejected. It appears that a signifi- 
cantly lower correlation is obtained when 
an open book examination is given fol- 
lowing a closed book examination than 
when both examinations are the closed 
book type. 

For the final null hypothesis, the stu- 
dents were asked to indicate whether they 
felt the open book examination had been 
of “Much,” “Some,” “Little,” or “No” 
help. As may readily be observed in 
Table 3, there was virtually no difference 
among the four groups of students in- 
dieating the four attitudes towards open 
book tests. 

Not only are the differences among 
the groups slight, but for the first study, 
those who felt the open book was “Little” 
help did relatively better than those who 
felt it was “Much” help; in the replica- 


Some help y |Little help} y | No help 
ig h 


change 


+4.74 
+0.12 15 | 





tion a similar reversal occurred between 
“Some” help and “Little” help. 

Therefore, the third null hypothesis was 
accepted, which was in accordance with 
the corresponding general hypothesis. It 
appears that the feelings of students re- 
garding help given by an open book ex- 
amination are not reflected in measured 
grade changes. 


Discussion 


This study has investigated the equiv- 
alence of the open book examination and 
the closed book examination. The results 
have indicated that, although under the 
conditions of this experiment the group 
average scores are not affected by the 
examination approach, the two types of 
examinations measure significantly dif- 
ferent abilities. 

While recognizing the obvious dangers 
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of over-generalizing, it is felt by the au- 
thor that the experimental situation is 
sufficiently “typical” of college examina- 
tions of the multiple-choice variety to 
have widespread applicability. 

It can be assumed, therefore, that some 
students do relatively better on open 
book examinations, while others do better 
on closed book examinations. Use of the 
open book technique, thus, would appear 
more rewarding to certain students than 
to others, and the belief that there is 
no difference between the two types of 
examination is of dubious validity. 

Therefore, if an instructor feels, with 
Tussing, that the open book examination 
is a more valid measure due to the de- 
crease in reliance on memory and de- 
traction from cheating, the open book 
approach would be most appropriate. On 
the other hand, if he feels that the closed 
book type provides more study motiva- 
tion and encourages a less superficial ap- 
proach to a course, he will undoubtedly 
adhere to the traditional examination. 

This study has shown that the two 
types of examinations measure signifi- 
cantly different abilities. It will now be 
necessary to investigate what factors dif- 
ferentiate students who are successful on 
each of the types of tests, so that in- 
structor decisions might be based on more 
complete information. 


RICHARD A. KALISH 


SuMMARY 


An investigation was made of the 
equivalence between open book examina- 
tions and closed book examinations. 

Two traditional closed book examina- 
tions were administered to a class of 
University of Hawaii students; the same 
examinations were administered to an- 
other section of students taking the same 
course with the same instructor, differing 
only in that the second examination was 
open book. A replication is also reported. 

Three hypotheses were tested: 1. The 
open book examination will lead to fewer 
student errors; 2. The open book exam- 
ination measures different abilities than the 
closed book examination; 3. Student rat- 
ings of the help received from open book 
examinations will not be related to ex- 
amination scores. The first hypothesis was 
not substantiated, but the second and 
third hypotheses were verified. 
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THE STANDARD ERROR OF MEASUREMENT 
OF THE DIFFERENCE BETWEEN A SUM 
SCORE AND ONE OF ITS PARTS 


FREDERICK B. DAVIS 
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For proper interpretation of an indi- 
vidual’s test scores it is sometimes neces- 
sary to ascertain the significance of the 
difference between a total score, consisting 
of the sum of several part scores, and one 
of those part scores. For example, the 
Level-of-Comprehension score from the 
Davis Reading Test (2) is based on the 
first 40 of the 80 items that determine the 
Speed-of-Comprehension score. A differ- 
ence between individual speed and level 
scores should be evaluated in terms of the 
standard error of measurement of the 
difference between a sum (the Speed-of- 
Comprehension score) and one of its parts 
(the Level-of-Comprehension score). 

For the convenience of clinical and 
school psychologists, the equations for 
computing the standard error of a differ- 
ence between overlapping total and part 
scores obtained by an individual drawn at 
random from a specified group will be pre- 
sented first and their use illustrated with 
data from the Davis Reading Test. The 
derivation of these new equations will 
then be provided. 


PracticaL PROCEDURES 


Let T represent an individual’s raw 
score made up of m parts. Then T = A + 
B + ---+ M. Let I represent any part of 
sum T, and P any part of sum T except I. 
Differences between sum T and any one 
of its parts are inconvenient to interpret 
unless all of the scores are made compar- 
able. For purposes of this discussion, 
comparable scores are defined as trans- 
formed raw-score values for which the 
corresponding true-score points are ex- 


ceeded by the same percentage of examinees 
in a defined sample. The desired raw-score 
values may be determined by the method 
given by Flanagan (3, pp. 752-760). 
They are transformed simply to make 
numerically identical comparable scores 
for which corresponding true-score points 
are exceeded by the same percentage of 
examinees in the defined sample. For 
example, for Form A of the Davis Reading 
Test a speed score of 31 and a level score 
of 21 are raw-score values for which the 
corresponding true-score points are ex- 
ceeded by about 61 per cent of the exa- 
aminees in the equating sample (which 
comprised 4,692 students in Grades 11 and 
12 and the freshman year of college). 
These two raw-score values have been 
transformed into comparable scores of 75 
It should be made clear that comparable 
scores, as defined above, are not neces- 
sarily measures of the same abilities or 
equally reliable. 

Fortunately, total and part scores are 
often expressed in serviceable approxima- 
tions to comparable scores. For example, 
Verbal, Performance, and Full-Scale IQ’s 
from the Wechsler intelligence scales 
are expressed in units such that their 
means are approximately 100, their 
standard deviations about 15, and the 
shapes of their distributions nearly normal. 
Similarly, total and part scores from the 
Cooperative Achievement Tests are ex- 
pressed in Scaled Scores that have means 
and standard deviations (in a defined 
hypothetical group) of 50 and 10, respec- 
tively, and distributions that are closely 
normal. 
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Suppose that the raw scores (T, A, 
B, ---, M, as well as I and P) of the 
individual mentioned above are expressed 
in comparable form and denoted Zr, 
Za, Ze, ---, Zu; and that Z; denotes a 
comparable score on any part of sum T 
and Zp any comparable part score except 
Z,. Then it should be noted that Zr # 
Zs + Ze + --- + Zu. The standard 
error of measurement of the difference 
between Zr and Z; may be written as: 


Smeas ( z..—2Z1) 


Sz - m—l 
an /® Sp*(1 — Tpp-) 
Ser-1) 


Smeas, Z_-2)) 


Pe a Vz Sp*Smesez» 
Scr) 8; 





or, 





or, 


A 
Smeas(z,-Z,) = 924-2, V1 — Teepe 


where: 





8(2,-2,) = V2 + 2? - 282.82,%r1 





Sqn = V8P + 8° — 2878: Tr 


SP Per + 8X — WPS Fr- 
8p + 8° — 2878; Pr 





r-yr’-1)) = 


and 


Pr = lr — ae [7] 


In the preceding equations, sz’, sr’, 
and Sp? and Irv, Tir, and pp, are the 
variances and reliability coefficients, respec- 
tively, of these variables in the original 
raw-score units of measurement. The 
correlation of sum scores and any given 
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set of part scores, expressed in these 
original units of measurement, is denoted 
as Irr1. Variances of the transformed 
comparable scores are denoted as sz 2, 
82,7, and 82z,°. 

Whether the difference Z, — Z; for 
any pupil chosen at random from the 
group tested may be regarded as a chance 
deviation from a true difference of zero 
at any desired level of confidence may be 
determined with serviceable accuracy by 
means of the critical ratio: 


(Zr — Z) — 0 


Smeas, Z_-Z,) 


CR [8] 


Choice among Equations [1], [2], and 
[3] for computing the standard error of 
measurement of a difference depends on 
which one can be employed most conven- 
iently with the data available. To deter- 
mine the standard error of measurement 
of the difference between the speed and 
level scores from the Davis Reading Test 
for a college freshman drawn at random 
from a group tested, Equation [1] is most 
convenient. The test results give the 
standard errors of measurement of these 
scores, in terms of the original raw-score 
units of measurement, as 5.5 for speed 
and 3.7 for level. Equation [12], therefore, 
yields (5.5)? — (3.7)*, or 16.56, as the 
numerical value under the radical sign in 
Equation [1]. Numerical values of the 
terms in Equations [4] and [5], also given in 
the manual, lead to a value of .35 for the 
ratio of S272) to &1-». The standard 
error of measurement of the difference 
turns out to be 1.4. When this value is 
used in Equation [8], a difference of 2 
points is found to be significant at about 
the 15 per cent level and one of 3 points 
at about the 3 per cent level. For a college 
student drawn at random from the group 
tested, a counselor or teacher would be 
justified in concluding that a difference 
between his speed and level scores of 3 
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points or more should be attributed to 
causes other than chance. It may occasion 
some surprise to find that the standard 
error of measurement of the difference 
between these two comparable scores is so 
small. This is largely accounted for by the 
fact that errors of measurement in them 
are positively correlated. 


DERIVATION OF EQUATIONS 


Let T represent a raw total score made 
up of m parts: A, B, ---, M. Then T = 
A+B + .--- + M. Let I represent any 
part of T and let P represent any part of 
T except I. Assume that an indefinitely 
large number of parallel forms of the 
tests from which these raw scores are 
’ derived are given to a pupil drawn at 
random from a grade group for which the 
tests are appropriate and postulate that 
this pupil’s true scores in the abilities 
tested remain constant throughout the 
testing. An essentially normal distribution 
of differences between T and I would 
then be obtained. The mean of the distri- 
bution would approach T, — I, (the 
difference between the pupil’s true scores), 
and its variance could be written as: 


Sra? = Ssty-a,41,° = Sr? + 813 
77%" 287, 81 Tr,1, 
+ 289 8p Pre + 281 81 Tr (9) 
- 28; 81, ri 287 81, Pr, 
- 2er_ %, Prt, 
where the subscript ¢ denotes a true score 


and the subscript e an error of measure- 
ment. 

Since we postulated that the pupil’s 
true scores remain constant, 87,’ is equal 
to s:,? is equal to zero. Consequently, 
Equation [9] may be simplified to: 


BSmeascr_y)? = &* + Qia 287, & Trt, [10] 
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It can easily be shown that the coeffi- 
cient 

8; 

Trt. = Tata) = or. 


[11] 


By definition, sr,? is equal to Smese,” and 
8:* is equal tO Smeas,’. Therefore, Equa- 
tion [10] may be simplified to: 

Smeas 71)" - Smeas;" - Smeas,? {12} 
If we make the usual assumption that the 
correlation of errors of measurement of 
separate tests will, under proper conditions 
of test administration, be zero, we may 
write: 
Smeas; = Sr? = 5A +B +...4m 

= 0% + Op* + --- + ou? +0 [13] 

m-—! 
= > Smeas,* + Smeas,* 


If a substitution is made for Smeas,;* in 
Equation [12], we obtain: 


m—l 
Smeasy_y* = > Smeas,” + Smeas,? 


ml (14) 
— 8meas;? = z Smeas," 

If sum T and each of its parts are 
transformed into comparable scores, as 
defined previously, we obtain Equations 
{1} and (2) by multiplying each side of 
Equation [14] by 

&2,-2,° 
8¢r-1)* 
and substituting 


8p? 
—— Smeasz ? 
3. * P 
Zp 


for Smeasp?. 


Equation [3] is a specific application of 
the well-known relationship: 


Smeasy = 8x Vl — rxx: 
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Equations [4], [5], and [6] are well known 2. Davis, F. B., & Davis, C. C. Davis Read- 


and the derivati f ti ing Test. Series 1, for High School and 
bees Fa ome th ie (7) hes College Students. Forms A, B, C, and D. 
pu y ws : New York: Psychological Corp., 1957. 
. 3. Firanacan, J.C. Units, scales, and norms. 
REFERENCES In E. F. Lindquist (Ed.), Educational 
1. Davis, F. B. Note on part-whole correla- measurement. Washington: American 
tion. J. educ. Psychol., 1958, 49, 77-79. Council on Education, 1951. 
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A NOTE ON SEX EQUALITY IN THE INCIDENCE 


OF LEFT-HANDEDN 
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It is commonly accepted that compar- 
isons of behavior in different cultures may 
provide data which are decisive for psy- 
chological theories. Yet the number of 
cross-cultural studies which have been 
used to test hypotheses is small. The pres- 
ent study may provide an additional dem- 
onstration of the value of cultural com- 
parisons. 

As background data let us note that 
reviews of data on handedness such as 
those by Wile (1) and Hildreth (2) show 
that while the frequency of left-handedness 
varies from activity to activity, in almost 
all studies the use of the left hand is more 
common among males than among females. 
The number of left-handed males have 
been found to exceed the number of left- 
handed females by 50% or to an even 
greater amount. This difference is present 
at least by four years of age and perhaps 
earlier. 

The consistency of this finding suggests 
that this sex difference may have a bio- 
logical basis. However, practically all of 
the investigations of hand preference have 
been conducted in Europe and America. 
For this reason the possibility exists that 
the lower frequency of left-handedness 
among women than among men is not 
biologically determined but rather that it 
may be a consequence of stronger social 
pressures against the use of the left hand 
among females than among males in west- 
ern countries. In view of these rival in- 


*This study was conducted while the 
author was a visiting professor at the Amer- 
ican University of Beirut. Expenses of the 
investigation were defrayed by a grant from 
the Rockefeller Brothers Fund. Adele Ham- 
dan Taky Din and Leila Biksmati served as 
research assistants. 


terpretations of the data just reviewed, it 
seems worthwhile to report that in one 
Near Eastern country and probably in a 
much wider area the sex difference in hand- 
edness found in western countries does 
not exist. 

The present study was conducted in 
the schools of Beirut, Lebanon, in 1955-56. 
Eleven schools were studied. In each class- 
room, a research assistant observed the 
handedness of children when they were 
engaged in writing in connection with their 
usual school work. In each school all grades 
from the kindergarten through Grade 5 
were observed. Some of the schools were 
coeducational, others were not. In the case 
of noncoeducational schools, an attempt 
was made to match boys’ schools and girls’ 
schools with respect to socioeconomic class 
and religious affiliation. 

A total of 2,656 pupils, 1,430 boys and 
1,226 girls were observed. The frequency 
of left-handedness was found to be 5.0% 
among the boys and 4.9% among the girls. 
The small difference is statistically insig- 
nificant at the 5% level. 

There is no reason to suppose that there 
are biological differences between Western 
and Near Eastern populations which would 
affect sex differences in handedness. The 
explanation of the difference between the 
finding just reported above and earlier 
findings probably is to be found in dif- 
ferences in social norms and child rearing 
practices. It seems likely that in the Near 
East left-handedness is no more repre- 
hensible in women than it is in men. How- 
ever, the precise attitudes and cultural 
conditioning in respect to handedness in 
this area can be identified only by further 
research. 
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In further investigations it will be de- 
termined how widespread among Near- 
Eastern peoples is the sex equality in 
handedness ratios found in Lebanon. 

This study suggests that it may be pos- 
sible for a society to produce more sin- 
istrality among females than among males. 
Whether such a society can be found re- 
mains to be determined. 
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INSTRUCTOR EFFORT TO INFLUENCE: AN EXPERIMENTAL 
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Instructors are frequently confronted 
with problems concerning the extent to 
which they should attempt to influence 
pupils in their attitudes and other be- 
haviors having strong emotional over- 
tones. They are uncertain about what 
techniques of influence are legitimate and 
the extent to which they should be per- 
suasive. Some educational leaders are cur- 
rently calling for teachers to be more per- 
suasive in their efforts to influence pupil 
behavior. Others maintain that high pres- 
sure methods cause resistance or that such 
methods do violence to our democratic 
ideals. 

Although most supervisors and instruc- 
tors readily admit the importance of at- 
titudes and other behaviors having strong 
emotional overtones, teachers generally 
have been reluctant to attempt to in- 
fluence such behaviors. Especially have 
they shrunk from direct influence at- 
tempts. There has been a fairly pervasive 
attitude that a student’s personal and 
social attitudes, emotional reactions, and 


* This report is based on work done under 
ARDC Project No. 7723, Task No. 77461, 
in support of the research and develop- 
ment program of the Air Force Personnel 
and training Research Center, Lackland Air 
Force Base, Texas. Permission is granted 
for reproduction, translation, publication, 
use, and disposal in whole and in part by 
or for the United States Government. The 
opinions or conclusions expressed or implied 
herein are those of the authors. They need 
not be construed as necessarily reflecting 
the views or endorsement of the Depart- 
ment of the Air Force or of the Air 
Research and Development Command. 
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the like are his personal business. This 
has been especially true in regard to 
matters affecting physical and mental 
health, eating, sleeping, sexual behavior, 
and the like. Seldom have such matters 
been chosen for scientific investigation 
and appropriate research situations have 
not been readily accessible. 

Little or no scientific information exists 
to guide the decisions instructors and 
supervisors must make in deciding what 
kinds of effort to influence should be 
made. In social psychology, attention has 
been focused upon influence among group 
members (5), the influence of group 
norms (1), and the influence of associ- 
ates in buying, politics, and the like (4). 
In the sales field, much has been said 
(2) about “low-pressure” selling in con- 
trast to “high-pressure” selling. More 
recently, “no-pressure” selling appears to 
be coming into prominence (2). Little 
scientific research of an experimental na- 
ture has accompanied these trends, how- 
ever. 

One difficulty which has hampered re- 
search concerned with emotional reac- 
tions has been the unavailability of sat- 
isfactory criteria. Too frequently, it has 
been necessary to accept verbal expres- 
sions concerning such reactions. Even 
when it has been possible to obtain other 
behavioral measures, there has been doubt 
concerning the “real” emotional reaction 
behind the overt behavior. The authors 
have been fortunate in having access to 
a situation which provided a variety of 
criteria, including verbalized attitudes, 
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overt behavior, and an indicator of emo- 
tional response. The experimental situa- 
tion involves the use of a survival ration 
commonly known as pemmican in the 
simulated survival exercise of the USAF 
Survival Training School. Use of the ra- 
tion almost always elicits a wide range 
of response from extremely unfavorable 
to extremely favorable. Since the ration 
is recognized by most authorities in the 
field as the best available one for use 
in most survival situations, its use in 
training should increase its acceptability 
and, in fact, does (9). 

An earlier study by the authors (11) 
gave a somewhat discouraging picture 
of the instructor’s ability to influence the 
acceptability of the ration. When given 
scientifically developed information about 
the psychological, social, and training 
factors related to the ration’s accept- 
ability and asked to use this information 
on behalf of their crews, aircrew com- 
manders (indigenous leaders) were far 
more successful than the crew instructors 
(11). Furthermore, those instructors who 
made the most effort to influence ac- 
ceptability (as measured by statements 
made by both the instructor and the 
trainees) tended to obtain the lowest 
acceptability. Sustained efforts by indig- 
enous leaders, however, were rewarded 
by increased acceptance. 

Instructors in this and other situations 
frequently must face the very realistic 
problem of influencing the attitudes and 
emotionally toned behaviors of their stu- 
dents. Thus, it is evident that there is 
a need for a clearer understanding con- 
cerning what it is that instructors do 
which produces negative effects and what 
they can do to exert more positive in- 
fluence. The purpose of the present study 
was to evaluate experimentally six al- 
ternative procedures by which training 
instructors may influence the acceptabil- 
ity of pemmican. 
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PROCEDURES 


The Ss of the study were 427 aircrew- 
men undergoing survival training. All Ss 
received a double issue of the emergency 
ration consisting of a total of eight meat 
bars (pemmican) supplemented by chili 
and onion powder, two cereal bars, two 
fruitcake bars, 16 cubes of sugar, and 
eight packets each of soluble coffee and 
tea. During the nine-day simulated sur- 
vival, escape, and evasion exercises, train- 
ees were able to supplement these rations 
to some extent by such native foods as 
porcupine, crawfish, wild onions, water 
cress, camus, and the like. 

A total of 43 instructors in the two 
successive classes were involved. Prior to 
the exercise, the training groups (crews 
consisting of 9 or 12 men each) were 
divided randomly into one control and 
six experimental groups. In each class, 
three training groups were involved in 
each of the experimental groups. In the 
first class, four groups were assigned to 
the control condition and in the second, 
three groups. 

Instructors were briefed by three ex- 
perienced psychologists thoroughly famil- 
iar with survival ration indoctrination 
and other aspects of the program of the 
USAF Survival Training School. The gen- 
eral purposes and design of the study 
were explained briefly. The instructors 
were asked to forgo their usual indoc- 
trination procedures and use only the 
technique they would be assigned. In- 
structors then met in groups of three or 
four, as the case might be, with one of 
the experimenters to discuss the technique 
to which they had been assigned. Prior 
to the discussion, each instructor com- 
pleted a questionnaire in which he in- 
dicated his personal reaction to the ration 
and described his usual indoctrination 
procedures. Each instructor was also 
given a typed sheet of instructions to be 
used as a guide in carrying out his as- 
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signed technique. The members of the 
control group were subjected to only the 
normal influences of the training situa- 


tion. 
The six experimental conditions may be 
described briefly as follows: 


Experimental 1 (No Influence). Instruc- 
tors were briefed to make no effort to in- 
fluence trainees to accept the ration. They 
were instructed to say as little about it as 
possible, assuming a rather neutral stand 
on acceptability. They were cautioned, how- 
ever, to avoid giving any impression of 
personal dislike for the ration. (This con- 
dition was designed to follow up a clue 
obtained from an exploratory study which 
indicated that trainees perceiving “no in- 
fluence” attempts on the part of their 
instructors responded more favorably than 
those perceiving various degrees of efforts 
to influence.) 

Experimental 2 (Good Ezample). In- 
structors were briefed to make no direct 
attempts to influence trainees to accept 
pemmican. They were issued a supply of 
the ration and instructed to make a defi- 
nite attempt to manifest a definitely favor- 
able attitude by personal example. This 
was done by eating the ration and casu- 
ally expressing favorable reactions to it. 
They were cautioned to make no appeal 
to the trainees to eat the ration. (This 
condition was designed to evaluate the 
effectiveness of the often used admonition 
to instructors to “set a good example” and 
“never ask your students to do anything 
that you do not do.”) 

Experimental 3 (Information). Instructors 
were asked to give information about the 
value of the meat bar as an emergency 
ration and about ways of preparing it. They 
were instructed to give this information 
in an objective, factual, “take-it-or-leave- 
it” manner and to give no information 
about psychological reactions. (This con- 
dition was designed to evaluate what was 
considered a “low-pressure” technique of 
influence.) 

Experimental 4 (Group Explanation). In 
addition to giving facts about values and 
ways of preparation, instructors were asked 
to emphasize the psychological factors 
which affect acceptability and to explain 
why this particular ration is used in the 
training exercise. The information about 
psychological influences was derived from 
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previous research by the authors (8, 9, 10). 
(This condition was designed to test the 
value of using information gained through 
research to provide a psychological ex- 
planation of behavior.) 

Experimental 6 (Individual Explanation). 
Instructors were briefed as in Experimental 
4 except that they were asked to work 
with individuals as individuals instead of 
with the group. They were asked to appear 
natural and casual, but sincere, in their 
attempt to exercise personal influence. 

Experimental 6 (Evaluation). Instructors 
were briefed to use the mildly coercive 
method of informing trainees that they 
would be “graded down” if they did not 
“really” try the ration. They were instruc- 
ted to explain that failure to eat the ration 
was an indication of poor “will-to-survive,” 
failure to take adaptive action, failure to 
take care of essential survival needs, failure 
to “play the game,” etc. (This condition 
was designed to evaluate the effectiveness 
of using evaluation as a device for mo- 
tivating or influencing behavior.) 


Following the field exercise, all Ss were 
administered a questionaire to obtain 
measures of acceptability and to provide 
additional facts concerning the conditions 
existing during the experiment. Accept- 
ability items included: (a) the traditional 
hedonic scale (7-point), requiring the S 
to indicate his reactions to each of five 
methods of preparing pemmican; (6) the 
number of bars eaten; (c) reasons for not 
eating the remainder (made me sick, too 
greasy, smells bad, etc.) ; and (d) the con- 
ditions under which the S would use pem- 
mican in the future. Previous research 
(10) had indicated that each of these 
items correlates significantly with and con- 
tributes importantly to an over-all index 
of rejection. 

This over-all index of rejection was ob- 
tained by combining the items in the fol- 
lowing manner. The ratings from the 
hedonic scales were weighted from one 
point for “like extremely” to seven points 
for “dislike extremely.” If S indicated 
that he had not tried the bar according to 
one or more methods, the mean rating for 
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the methods tried was assigned. One point 
was scored for each bar not eaten but 
no extra credit was awarded for eating 
more than the number of bars issued. Re- 
ports of having “been made sick” added 
five points and each of the other reasons 
for not eating the remainder of the bars 
was scored one point. Five points were 
added for responses of “would eat only 
when extremely hungry” and 10 points for 
“would not eat even if very hungry.” 


RESULTS AND CONCLUSIONS 


First, an effort was made to determine 
the over-all effects of the six experimental 
influence techniques. Means and standard 
deviations for the Rejection Index and 
number of meat bars consumed and num- 
bers and percentages for “made sick” and 
intension to eat the ration in the future 
“whenever hungry” for each condition are 
shown in Table 1. Using Bartlett’s test, 
the requirements for homogeneity of vari- 
ance are not met in the case of both the 
Rejection Index and number of meat bars 
consumed. Over-all chi squares indicate 
significant differences among the various 
conditions for both “made sick” and in- 
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tension to eat the ration in the future 
“whenever hungry.” 

The heterogeneity of variance for Re- 
jection Index and number of meat bars 
consumed seems to be due primarily to 
the small dispersion in Experimental 6. 
Table 2 presents the F ratios between Ex- 
perimental 6 and each other condition. It 
will be noted that all of the F ratios are 
significant at the .05 level or better for 
Rejection Index. Only the F ratio between 
Experimental 2 and Experimental 6 fails 
to reach significance for number of meat 
bars consumed. In general, then, it may 
be concluded that all of the conditions are 
more erratic in their effects than Exper- 
imental 6. 

Using the method described by Edwards 
(3, pp. 272-274) to correct for variance, 
direct tests were made to compare the 
means for Experimental 6 with the means 
for each other condition. The t ratios thus 
obtained are presented in Table 2. Using 
Rejection Index as the criterion, Exper- 
imental 6 appears to produce greater 
acceptability than the other conditions 
except Experimental 4 (Group Expla- 
nation). Using number of meat bars con- 
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4 Number of Ss in Exp. 6 was reduced by eliminating one crew whose instructor was replaced by an unbriefed 


instructor after the beginning of the experiment. 
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TABLE 2 
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* Significant at the .05 level or better. 
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sumed as the criterion, Experimental 6 
achieved results significantly superior to 
all conditions except Experimental 3 (In- 
formation) and the Control Condition. 

Further direct tests were made by com- 
paring the Control Condition with each 
other condition. As already shown, Exper- 
imental 6 achieved significantly better 
results than the Control Condition using 
the Rejection Index as the criterion. Ex- 
perimental 2 (Good Example) and Exper- 
imental 5 (Individual Explanation), how- 
ever, appeared to produce significant 
negative effects (t ratios = 3.154 and 
2.605 respectively, both significant at bet- 
ter than the .05 level). 

When direct tests are made by applying 
chi-square analysis to the “made sick” eri- 
terion, the results are similar to those ob- 
tained for number of bars consumed. 
Experimental 6 again is superior at the 
05 level or better to all conditions except 
the Controls and Experimental 3. Using 
intention of using the ration in the future 
as the criterion, Experimentals 3 and 4 
produce results on a par with Experi- 
mental 6. 


Several features concerning Experi- 
mental 3 (giving objective information 
about the value of the ration and describ- 
ing methods of prer*ration) need to be 
noted. This experimental condition was 
accompanied by slightly higher mean con- 
sumption and willingness to eat the ration 
“whenever hungry” than even Experi- 
mental 6. The differences, however, fall 
far short of statistical significance. Using 
the latter criterion to compare Experi- 
mental 3 with the Controls, however, re- 
sults obtained by Experimental 3 are su- 
perior (chi square = 7.31, significant at 
better than the .01 level). Experimental 
3, however, appears to be quite erratic 
in its effects as indicated by the rela- 
tively large standard deviations of Re- 
jection Index and number of meat bars 
consumed. 


Discussion 


In interpreting the results of this study, 
inescapable difficulties of experimental re- 
search in this area need to be made ex- 
plicit. First, the Ss under the control 
condition cannot be regarded as “un- 
trained.” They were subjected to varying 
degrees and kinds of influence. Question- 
naire responses indicated that all of the 
control instructors conducted indoctrina- 
tion concerning survival rations. It might 
even be argued that instructors in the ex- 
perimental conditions were unpracticed 
and perhaps unskilled in the techniques 
which they were asked to use. It is cer- 
tainly not contended that the instructors 
of the experimental groups were perfect in 
their adherence to the technique assigned. 
Nevertheless, the checks made indicated 
reasonable adherence to the assigned con- 
dition. 

In general, the results of this study 
support the leads obtained from the pre- 
vious studies to which reference has been 
made. It is interesting to note that the 
two methods having significant boomerang 
effects are those relying most heavily on 





216 


personal influence. A number of expla- 
nations might be advanced. The expla- 
nation which best satisfies the authors is 
that the boomerang effect resulted from 
the phenomenon of “negative identifi- 
cation” discussed by Torrance and Ziller’ 
in an earlier paper. According to this ex- 
planation, trainees perceive instructors as 
different from themselves and different in 
ways which prevent close identification. 
The trainee is a member of an aircrew. 
The instructor is not. He is an “earth- 
bound” man. The instructor is something 
of a woodsman and is comfortable in the 
out-of-doors; usually, the trainee is not 
and frequently cannot imagine any “nor- 
mal” person as being. The instructor is 
relatively young and in outstanding phys- 
ical condition; usually the trainee is older 
and in comparatively poor physical con- 
dition. Thus, there appears to be the basis 
for negative identification and the adop- 
tion of behavior opposite that personally 
recommended by the instructor. These 
two techniques may also be regarded as 
“indirect” attempts to influence in com- 
parison with the more “direct” approaches 
employed in Experimentals 3 and 6. 

The experimenters’ first impulse upon 
examining the results concerning the su- 
periority of Experimental 6 was to reject 
them. Every attempt, of course, had been 
made in advance to maintain as rigorous 
controls as possible. The sampling, the 
indoctrination of instructors, and the col- 
lection of the criterion data had been ac- 
complished as carefully as possible. The 
instructors did not see the completed 
blanks and Ss were not required to sign 
their names, so there was little chance of 
threat to the trainee. Someone suggested, 

* Torrance, E. P., & Ziller, R. C. Nega- 
tive identification in groups as a function of 
personality differences. Reno, Nevada: 
Survival Methods Branch, Air Force Per- 
sonnel and Training Research Center, Stead 
Air Force Base, March 1956. (Laboratory 
Note CRL-LN-210.) 
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however, that trainees in Experimental 6 
might have buried or destroyed some of 
their bars in order to make a good impres- 
sion upon the instructor, since he was 
grading them on their use of it. Upon in- 
vestigation, however, it was ascertained 
from independent witnesses that some of 
the crews in Experimental 6 had ex- 
hausted completely their supply of the 
ration and had bartered additional bars 
from other crews. For example, one crew 
bartered 33 additional bars and another 
20 from other crews which did not con- 
sume their supply. The experiment was 
even replicated on another sample with 
essentially the same results. 

Again, a number of alternative ration- 
ales might be advanced to explain the 
superiority of Experimental 6. Some 
might argue that men in our culture have 
been conditioned to respond favorably to 
this mildly coercive technique. If this 
were the only explanation, however, one 
would expect more evidence of “behavior 
without conviction” than is apparent. In 
the light of previous studies, the authors 
would argue that this is a simple and 
direct technique which is superior to in- 
direct types of influence. Survival ration 
indoctrination is made an integral part 
of training and given its proper impor- 
tance. The ration takes on meaning in 
terms of training and preparation for pos- 
sible future emergencies and/or extreme 
conditions. It is no longer “just something 
to eat during training.” This type of in- 
fluence attempt places the instructor in 
an “official” rather than a “personal” role 
and he is probably more acceptable and 
influential in such a role. 

It should not be concluded that in- 
structors should avoid the “good example” 
and other techniques of personal influence. 
According to our interpretation, however, 
such techniques are likely to boomerang 
if trainees identify negatively with the 
instructor. Even Experimental 4 (Group 
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Explanation) is probably influenced by 
this phenomenon. The rationale for Ex- 
perimentals 4 and 5 was taken from 
Stefansson’s experiences in indoctrinating 
members of his exploration parties con- 
cerning “arctic hysteria” (7). He main- 
tained that newcomers to the Arctic were 
simply not bothered by “arctic hysteria,” 
if they were given a satisfactory explana- 
tion of its psychological basis. It is likely 
that these young explorers identified 
strongly with Stefansson, accepted his 
explanation and were influenced by it. 

The findings are probably applicable to 
situations in which instructors need to 
influence attitudes and other behaviors 
with strong emotional overtones. In gen- 
eral, it would appear that instructor at- 
tempts to influence should be of the 
direct, “take-it-or-leave-it” variety and 
should be made in the instructor’s “official” 
rather than “personal” role. Although the 
influence of associates may be far stronger 
than that of instructors, the findings of 
this study do suggest that instructors may 
play significant roles in influencing atti- 
tudes and other behaviors having strong 
emotional overtones and that this can be 
a fruitful area of research. It is possible 
that the findings of this study can be 
generalized to other conceptually similar 
situations where it is desirable to influence 
attitudes and behavior, particularly in ed- 
ucational situations. It is also likely that 
some of the findings may apply to influence 
situations in such activities as selling. The 
findings are quite in accord with theories 
which have been developed in the past 
decade concerning the superiority of “low- 
pressure” sales techniques. Naturally, all 
of these findings need to be tested in other 
situations. 


SuMMARY 


A sample of 427 aircrewmen partici- 
pating in a survival exercise were divided 
randomly into seven groups (six exper- 


imentals and one control). Crew instruc- 
tors of the experimental crews were re- 
quested to conduct the survival-ration 
indoctrination according to specific in- 
structions. Using four criteria of accept- 
ance of the ration, an experimental con- 
dition making the food indoctrination a 
regular part of the training accompanied 
by evaluation tended to produce superior 
results. Promising results were also ob- 
tained from a “low-pressure” technique 
relying chiefly upon objective information 
and straight-forward instructions concern- 
ing preparation. Significant negative ef- 
fects were obtained from conditions rely- 
ing upon personal persuasiveness, setting 
an example, and the like. 
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CHARACTERISTICS IN GIFTED CHILDREN 
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Terman’s study of the gifted has shown 
that, in general, highly intelligent children 
in addition to being larger and healthier, 
are also somewhat more adjusted socially 
than the average child. His gifted group, 
including those who had been accelerated 
in school, carried these advantages into 
adult life (3). More recently, the Ford 
Foundation’s Fund for the Advancement 
of Education found that a group of gifted 
children coming to college two years early 
had adjusted as well socially and emo- 
tionally to college life as had their class- 
mates (2). 

At present there is considerable dis- 
cussion in educational circles of the ad- 
vantages and disadvantages of accelera- 
tion and special grouping as administrative 
tools in meeting the needs of gifted chil- 
dren. Many school administrators look 
with disfavor on these techniques. When 
asked why, they usually give a reply 
which implies that the social adjustment 
of gifted children is rather fragile. Is 
this fear justified? Are gifted children 
more often or less often subject to se- 
vere maladjustment than other children? 

The purpose of this study is to examine 
the overlapping of talents and maladjust- 
ments in a group of 1015 public school 
children in late childhood and early ado- 
lescence. The research is part of a 10- 
year action-research project being carried 
out by the Committee on Human De- 
velopment of the University of Chicago. 


PROCEDURES 


The population of the study comprised 
the entire public school population of the 
fourth and sixth grades in a Midwestern 
city of 45,000 in the school year 1951-52, 
the first year of the study. For each child 


included in the population, the following 
characteristics were measured: aggressive 
maladjustment, withdrawn maladjustment, 
social leadership ability, artistic talent, 
and intellectual ability. Tests designed to 
measure all these characteristics were ad- 
ministered during the first year of the 
study. The tests measuring the first three 
characteristics were readministered dur- 
ing the second and fourth years of the 
study. Children for whom test informa- 
tion was incomplete were excluded from 
this study. 

Two tests were used in determining 
aggressive maladjustment, withdrawn mal- 
adjustment, and social leadership ability. 
One is the “Who Are They?” (W.A.T.), 
a sociometric instrument based on chil- 
dren’s evaluations of their peers with re- 
spect to these three behavioral character- 
istics (1). A child’s leadership score was 
determined in response to questions such 
as, “Who are the leaders, the leaders in 
several things?” “Of the people you run 
around with, who are the ones who come 
up with good ideas of interesting things 
to do?” Aggressiveness was determined by 
nominations to questions such as, “Who 
are the boys and girls that seem to be 
against everything that is suggested—the 
gripers?” “Who are the bullies, the boys 
and girls who try to push others around?” 
The following questions are typical of those 
contributing to the withdrawn score, “Who 
are the ones that are too shy to make 
friends easily? It is hard to get to know 
them.” “Who are the boys and girls who 
usually come and go alone and stay by 
themselves most of the time, even though 
they aren’t trouble makers?” 

The other instrument used to measure 
aggressiveness, withdrawnness, and leader- 


219 





220 GORDON 
ship was the “Behavior Description Chart” 
(B.D.C.), a forced-choice teacher rating 
instrument. Here teachers had to pick 
the items “most like” and “least like” a 
given child in a series of 10 groups of five 
statements each, such as the following: 

A. Other people find it hard to get 
along with him. 

B. Is easily confused. 

C. Other people are eager to be near 
him or on his side. 

D. Is usually willing to go along with 
the group. 

E. Interested in other people’s opinion 
and activities. 

In the foregoing pentad, if A was thought 
to be the statement “most like” this 
child, this contributed to his aggressive 
score. If B was thought to be most typical, 
this contributed to his withdrawn score. 
Item C is a leadership item, and D and E 
are not scored since they are presumed to 
be typical of average children. Similarly 
a “least like” nomination for A, B, or C 
subtracted from the child’s score on that 
variable. 

Each individual was given a percentile 
score for aggressiveness, withdrawnness, 
and social leadership ability on each of 
the two tests administered in each of the 
three years. Because high scores for one 
year might be unduly affected by a tem- 
porary upset in the child’s life or an 
atypical relationship with one of his teach- 
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Aggressive malad- 
justment 

Withdrawn malad- 
justment 


Social 
ability 


leadership 
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ers, it was thought best to add the six 
percentile scores obtained from the two 
tests and divide by six to get a local 
mean percentile score. This was done for 
each of the three behavioral character- 
istics. 

The scores from the first two years of 
testing have been utilized for a reliability 
study. Product-moment correlations be- 
tween the two sets of percentile scores are 
reported in Table 1. 

It should be remembered that this is a 
severe reliability measure, since in the 
second year the children were in different 
classrooms, with different teachers, and 
with from 25 to 60 per cent turnover in 
classroom membership. 

It was noticed that the children who 
ranked high in any one category generally 
ranked in the same category on subsequent 
tests, but that considerable shifting occurs 
in the relative positions of the low-rank- 
ing children. Measured leadership ability 
remained more constant from one year to 
the next than did the maladjustment 
characteristics. 

For all three characteristics, the top 
7-10 per cent of the children received half 
of all the nominations on the W.A.T. 
Thus, this instrument differentiates quite 
clearly among those children displaying 
each characteristic to a high degree, but 
does not differentiate among those who 
seldom display the characteristic being 
measured. The B.D.C. yielded a rather 
similar distribution of scores. 

Intellectual talent was determined 
through use of both tests of “general” in- 
telligence and tests of such “specific mental 
abilities” as could be measured in children 
of 10 or 12 years of age. Also an effort was 
made to include some tests which were 
thought to be more “culture-fair”; that 
is, tests which did not discriminate against 
the children of lower socioeconomic status 
groups. 

The following tests were used for each 
child: the Science Research Associates 
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Primary Abiiities Test (P.M.A.) for ages 
7-11, the Davis-Eells Games, the Goode- 
nough Draw-A-Man Test, the Thurstone 
Concealed F igures Test, and the verbal, 
spatial, and reasoning subtests of the 
Chicago P.M.A. for ages 11-17. 

The percentile scores on the seven in- 
tellectual measures were averaged. This 
was a rather arbitrary decision, but it 
might be said that the use of a multiple- 
regression equation was discarded since 
this method requires an accepted, inde- 
pendent criterion of talent with which the 
screening instruments could be correlated. 
Academic achievement test scores or aca- 
demic grades could have been used as cri- 
teria, but there was little reason to suppose 
that they would have been better than the 
test score itself as a criterion. 

Artistic talent was determined by ask- 
ing a group of local artists to rate four 
pictures drawn by each child. These pic- 
tures were: a classroom as seen from the 
doorway, a landscape, a free assignment 
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to draw the child’s favorite subject, and 
the Goodenough Draw-A-Man Test scored 
with different criteria from those used to 
seore it as an intelligence test. 

After the testing had been completed, 
the 10% of the total group displaying 
each of the five characteristics to the 
highest degree were set aside, and it is 
these top 10% groups which will be in- 
vestigated in this study. 


REsvULTs 


Table 2 points out that children who are 
talented in one area are quite likely to be 
talented in other areas, but are quite un- 
likely to be seen as highly maladjusted. 
Chi square was used in determining the 
statistical significance of the differences 
between observed and expected frequencies 
of overlapping among the five character- 
istics. 

Table 2 shows that: 

1. Social leadership ability is positively 
related to the other talents and negatively 


TABLE 2 
OVERLAPPING OF TALENT AND MALADJUSTMENT CATEGORIES 
(1015 children) 
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related to the two maladjustment charac- 
teristics. There are 76 instances of over- 
lapping between social leadership and the 
other talent areas, but no overlapping with 
the two maladjustment areas. 

2. Intellectual talent is significantly re- 
lated to the other talents and is almost as 
surely negatively related to the maladjust- 
ment characteristics. There are 78 in- 
stances of overlapping with the other two 
talent areas, but only 6 instances of over- 
lapping with the two maladjustment char- 
acteristics. Intellectual talent and social 
leadership ability overlapped more than 
four times as often as would be expected 
on the basis of chance occurrence. 

3. Artistic talent is highly related to 
the other talent areas, but while there is 
a negative relationship between artistic 
talent and the maladjustment character- 
istics, this relationship is not statistically 
significant. There are 64 instances of over- 
lapping with one of the talent areas, while 
there are 12 instances of overlapping with 
one of the maladjustment categories. 

4. The overlapping between withdrawn 
and aggressive maladjustment is not sta- 
tistically significant. 

Since there is a possibility that only the 
extremely intellectually gifted have severe 
adjustment problems, it was decided to 
examine the 51 children with the highest 
intellectual scores, the top 5%. Only two 
of the 51 children were in the top 10% 


TABLE 3 


INTERCORRELATIONS OF TALENT AND 
MALADJUSTMENT CATEGORIES 








Intel- 
lectual 


Leader-| With- | Aggres- 
ship 


Variables dumm | cee 





Intellectual _ -49*/—.45*|  .05 
Leadership .37*| — |—.76*|—.05 
Withdrawn —.28*|—.61*%} — |—.22 
Aggressive —.11 |—.23%|—.24*| — 

















Note.—Coefficients for girls (N = 143) above diag- 
onal; for boys (N = 130), below diagonal. 
* 1% level of confidence 
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in one of the maladjustment categories, 
while there were 41 instances of over- 
lapping with the talent areas. Thus, the 
top 5% in intellectual talent had more 
overlapping with the other talents and less 
overlapping With the maladjustment cate- 
gories that did the second 5%. 

For readers who are interested in the 
correlations of these characteristics 
throughout their entire range, Table 3 
presents these correlations for 273 of the 
children who were in the sixth grade at 
the beginning of the study. It is believed 
that the correlations for the entire 1015 
children would be quite similar. 

In interpreting Table 3 it must be re- 
membered that the tests used to measure 
leadership and the maladjustment char- 
acteristics were set up to screen out those 
children displaying a given characteristic 
to a high degree and were not intended 
to differentiate between children display- 
ing these characteristics to a lesser degree. 

The table indicates that intellectual 
ability and social leadership ability are 
significantly correlated and that both are 
negatively related to withdrawnness. While 
the negative relationship between with- 
drawnness and aggressiveness is statisti- 
cally significant, it is not extremely high. 
Except for the negative relationship be- 
tween aggressiveness and leadership for 
boys, there are no statistically significant 
relationships between aggressiveness and 
the talent variables. Artistic talent was not 
quantified throughout the entire range and 
thus could not be correlated with the other 
variables. 


SuMMARY 


The top 10% groups in intellectual 
talent, social leadership ability, artistic 
talent, aggressive maladjustment, and 
withdrawn maladjustment were examined. 
It was found that children who wer_ highly 
gifted in one of the three talent areas were 
quite likely to be talented in other areas, 
and quite unlikely to be seen as highly 
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ATTITUDE CHANGE THROUGH UNDIRECTED 
GROUP DISCUSSION 


K. M. MILLER AND J. B. BIGGS' 
University of Tasmania 


Studies of attitude change have empha- 
sized many variables; the present study 
was concerned with the effectiveness of 
free group discussion about racial groups 
when the discussion groups are socio- 
metrically structured. Some writers, (3, 
8, 11) have stressed the importance of 
using sociometric structure as an aid to 
effective classroom work, suggesting that 
the educational process is more efficient 
when groups are composed of mutually 
attracted members, i.e. when groups are 
cohesive. 


DesiGn or Stupy 


Two types of groups were selected from 
a class—psychegroups considered high, and 
sociogroups low in cohesion. Attitude 
change was assessed by testing with an 
attitude scale before and after a period of 
undirected discussion about a number of 
racial groups. The stability of any change 
was assessed by a third test some weeks 
later. A control class completed the atti- 
tude seale at the same times but without 
intervening discussion. 

The interval between the first and sec- 
ond tests was designed to minimize meinory 
of responses to the first. School vacation 
prevented the interval between the second 
and third tests being identical with the 
first interval. On no occasion were Ss in- 
formed that subsequent tests would be 
given. 

TECHNIQUES 

Sociometric. The conventional form of 

the Moreno technique was used, Ss being 


asked to write the names (up to five for 
each category) of those classmates next 


2 Now at National Foundation for Educa- 
tional Research, London. 


to whom they would like to sit and would 
not like to sit. 

Social Attitudes. A Bogardus type scale 
was selected as the most suitable both for 
repeated measurement and for showing 
changes after discussion. The form used 
was similar to the Zeligs and Hendrick- 
son (13) modification but had been inde- 
pendently derived by the senior author 
for a previous study. The steps were: 

would like to have live in my home. 
would like to have as a close friend. 
would like to go for a holiday with. 
would like to have in my sports team. 
would like to work with in school. 
would like to have live in my street. 

I would like to have live in my country. 

So the list of racial groups would be 
meaningful for the Ss, 14 were selected 
either on the basis of percentage of na- 
tional group among migrants to Australia 
or for historical reasons, e.g., English and 
Japanese. The groups were: American, 
Chinese, Dutch, English, German, Indian, 
Irish, Italian, Japanese, Jewish, Negro, 
Polish, Russian, Balt. 

Scoring was by the Zeligs method (12) 
whereby each positive response coun ied 
one point. 


SuBsEcts 

Two third year secondary school classes 
of 26 and 16 boys respectively were se- 
lected. Of the larger class 24 members 
with a mean age of 177 months, SD 8.5, 
were experimental Ss while all of the 
smaller class with a mean age of 180 
months, SD 9.0 months, were control Ss. 
The difference in age was not significant. 


SELECTION oF Discussion Groups 


The method of analysis of sociometric 
data suggested by Clark and Maguire (2) 
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was used, supplemented in the final choice 
of groups by reference to sociograms. 
Three friendship or psychegroups and 
three neutral or sociogroups each con- 
taining four boys were selected. The 
psychegroups were chosen so that within 
each (i) all members expressed strong 
1, 2, 3) choices for each member of the 
group (ii) each member received at least 
one mutual choice and (iii) no one was 
rejected by other members of the group. 
The composition of the sociogroups was 
such that no member had expressed either 
acceptance or rejection of any other 
member. 


PROCEDURE 


The task was presented as part of a 
general study being conducted in several 
countries, to find out how children thought 
about people in their own and other 
countries.” To encourage frankness Ss 
were assured that all replies were confi- 
dential and would not be seen by anyone 
in the school. The sociometric and social 
distance scales were then administered to 
both control and experimental classes. 

The initial administration of the social 
distance scale was as recommended by 
Bogardus (1), the EZ reading the items 
at three-second intervals. On the later 
occasions Ss were allowed to complete the 
seale at their own rate. 

Four weeks later the group discussions 
were begun, friendship and neutral groups 
working alternately. Each group was in- 
structed only after assembling and at the 
conclusion of the discussion the members 
were asked not to discuss the activity 
with the other boys. The Z introduced the 
discussion, explaining that he would like 
each member to say something about a 
number of racial groups. The discussion 
period lasted approximately 30 minutes, 
two minutes being allowed for each of the 


*A report is in preparation on the Inter- 
national study, the design of which is de- 
scribed in (7). 
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14 races, though discussion of any one 
group was not abruptly terminated. 

During the discussion, E was passive 
and nondirective, averting any questions 
put directly to him. Following the dis- 
cussion the social distance scale was re- 
administered. After half of the group dis- 
cussions had been completed the scale was 
given a second time to the control group. 

As a check on the stability of any change 
the scale was given a third time two weeks 
after the last group discussion to both 
control and experimental classes. 


RESULTS AND ANALYSIS 


Comparison of control and experimental 
groups. A t test of initial scores revealed 
no significant difference (t = 1.10, 38 df), 
thus showing that the mean attitude level 
of the two could be considered equivalent. 

The effect of discussion. The mean scores 
for each of the three sections—psyche- 
groups, sociogroups, and control class—on 
first and second administrations were 
tested for differences. The differences for 
the friendly and neutral Ss were signifi- 
cant beyond the one per cent level (t = 
3.43 and 3.16, 11 df respectively) while the 
difference for the controls was not sig- 
nificant, (¢ = 0.32, 15 df). 

Differences between second and third 
administrations. Inspection of Table 1 
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TABLE 2 


NumsBer or Remarks or Eacu Type 
MapeE Durine Discussion 








Un- 
Favour- Sean Neu- 


able tral Total 





326 
323 


133 42 
129 39 


Psychegroups 151 
Sociogroups 155 





shows that the mean scores for the third 
administration are lower than immediately 
after the discussion period in the case of 
the friendly and neutral Ss. These differ- 
ences were not significant for any of the 
three sections (t values of 2.14, 1.27, and 
0.86 for friends, neutrals, and controls 
respectively). 

Differences between first and third ad- 
ministrations. A similar analysis was made 
of differences between first and third 
sets of scores. The t test analysis showed 
that final scores were significantly greater 
than the initial scores for both psyche- 
group and sociogroup Ss; at the two per 
cent level for the former and at the five 
per cent level for the latter. Again the 
differences in the control class scores were 
not significant (t = 0.33). 

Quantitative aspects of the discussions. 
The remarks made by each S were re- 
corded and categorized as favorable, un- 
favorable or neutral towards the race 
under discussion. The total remarks are 
shown in Table 2 where it is seen that the 
total number of remarks and the distribu- 
tion according to category are approxi- 
mately equal for the psychegroups and 
sociogroups. Examination showed that in 
the psychegroups remarks were somewhat 
more evenly spread over all members than 
in the sociogroups. 


Discussion 


The analysis has shown that the mem- 
bers of Loth psychegroups and sociogroups 
show more tolerance (decreased social 
distance) after free undirected discussion 
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of some characteristics of different racial 
groups. No such change is shown by con- 
trol Ss tested after the same interval of 
time. The amount of initial change appears 
to be unrelated to type of group as the 
mean difference between friends and neu- 
trals is not significant. 

When the same scale is applied again 
after an interval of two weeks the mean 
scores of members of both types of ex- 
perimental group are closer to the mean 
scores in the first testing than they were 
on the second. The amount of this re- 
version is, however, not statistically sig- 
nificant, indicating that the favorable 
change engendered by the discussion was 
fairly stable over this short period. Fur- 
ther confirmation was provided by the 
comparison of the initial and final scores 
which were significantly different for both 
psychegroups and sociogroup subjects, at 
two and five per cent levels respectively. 

Few, if any, investigators have sug- 
gested that free undirected discussion 
about racial groups would lead to the 
measurable changes demonstrated in this 
study. Some investigators (6, 9) indicate 
that attempts to change attitudes are more 
successful when Ss are members of natu- 
rally working groups. While a class is in 
some respects a functioning group it has 
within it a number of groups which are 
more cohesive than the class as a whole. 
Thus it would not have been surprising to 
find the members of the psychegroups 
showing greater and more stable change 
than the sociogroup members. The results 
of the present study are not in accord with 
such expectations as both friends and 
neutrals show (approximately) equally 
signifieant changes and stability of change. 
Moreover, they did not differ from each 
other in degree of change throughout.’ 

The present findings are, however, in 
keeping with the suggestion that close 


*Unrelated ¢ tests on differences scores 
between psychegroup and sociogroup mem- 
bers at each stage were nonsignificant. 
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friendships may lead to better communi- 
cation and wider participation in the 
group (6). The evidence for better com- 
munication is provided by notes of the 
actual discussion sessions. The climate in 
the psychegroups was freer and more 
lively and discussion more spontaneous 
than that in the sociogroups where dis- 
cussion tended to be more formal and 
reserved, though no less frequent, and 
not different in the proportion of favor- 
able, unfavorable and neutral remarks. 
Thus although quantitative differences be- 
tween friend and neutral groups were 
not discovered, there is a difference in the 
way in which the quantitative result was 
achieved. 

Some investigators (8, 10) have con- 
sidered such changes in terms of group 
conformity, suggesting that there would 
be a greater tendency for members of 
sociogroups, being less cohesive than 
psychegroups, to establish some norm or 
common position. Others (4, 9) have con- 
sidered such changes as a function of 
security—insecurity and personality ad- 
justment—maladjustment suggesting that 
less secure, less well-adjusted persons may, 
as a means of establishing a more secure 
group relationship, change towards a cen- 
tral position. When the results of the 
present study were examined for evidence 
of conformity it was found that for all 
three sociogroups and for none of the 
psychegroups the range of scores after 
discussion was considerably smaller than 
before. 

The relevance of these findings for 
education requires consideration as the re- 
sults seem to be at variance with those 
usually claimed by proponents of the 
sociometric approach. Work of investi- 
gators such as Cunningham, Oeser, and 
Shoobs suggests that group discussion 
would be more effective in the psyche- 
groups than in the sociogroups, whereas 
it has been shown in this study that meas- 
ured changes are approximately equal for 
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both types of groups. Further research is 
required to ascertain whether the changes 
are primarily a function of the learning 
process, in one case, and a function of 
personality factors and insecurity in the 
other. 

SuMMARY AND CONCLUSIONS 

This study was an attempt to relate 
attitude change with free discussion in 
psyche- and sociogroups. Several findings 
are definite while others merit further 
investigation. 

1. Free, undirected discussion about 
racial groups by two types of small groups, 
selected on a sociometric basis, resulted 
in a significant change of attitude irre- 
spective of the type of group. Further, 
this change was relatively stable over a 
short period. 

2. Contrary to expectations from socio- 
metric studies in the classroom, and from 
studies of group structure, no significant 
differences between the quantitative 
changes of friendly and neutral Ss were 
discovered. 

3. Nevertheless, it was suggested that 
the psychological processes in the two 
types of groups might be different and 
that further investigation is necessary 
to show whether the tendency for the 
scores of members of sociogroups to come 
closer to a central position after discus- 
sion is a function of personality adjust- 
ment. 
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